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COMPOSITE BINDING POLYPEPTIDES 

TECHNICAL FIELD 

5 The present disclosure is in the fields of molecular biology and protein design; in 
particular, the design of sequence-specific binding proteins for regulation of gene 
expression. 



10 BACKGROUND 



Protein-nucleic acid recognition is a commonplace phenomenon that is central to a large 
number of biomolecular control mechanisms that regulate the functioning of eulcaryotic 
and prokaryotic cells. For instance, protein-DNA interactions form the basis of the 
15 regulation of gene expression and are thus one of the subjects most widely studied by 
molecular biologists. 

A wealth of biochemical and structural information explains the details of protein-DNA 
recognition in numerous instances, to the extent that general principles of recognition 
20 have emerged. Many DNA-binding proteins contain independently folded domains for 
the recognition of DNA, and these domains in turn belong to a large number of structural 
families, such as the leucine zipper, the "helix-turn-helix" and zinc finger families. 

Despite the great variety of structural domains, the specificity of the interactions observed 
25 to date between protein and DNA most often derives from the complementarity of the 

surfaces of a protein a-helix and the major groove of DNA. See, e.g., Klug, (1993) Gene 
135:83-92. In light of the recurring physical interaction of a-helix and major groove, the 
tantalising possibility arises that the contacts between particular amino acids and DNA 
bases could be described by a simple set of rules; in effect a stereochemical recognition 
30 code which relates protein primary structure to binding-site sequence preference. 
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It is clear, however, that no code will be found which can describe DNA recognition by 
all DNA-binding proteins. The structures of numerous complexes show significant 
differences in the way that the recognition a-helices of DNA-binding proteins from 
different structural families interact with the major groove of DNA, thus precluding 
5 similarities in patterns of recognition. The majority of known DNA-binding motifs are 
not particularly versatile, and any codes which might emerge would likely describe 
binding to a very few related DNA sequences. 

Even within each family of DNA-binding proteins, moreover, it has hitherto appeared 
10 that the deciphering of a code would be elusive. Due to the complexity of the protein- 
DNA interaction, there does not appear to be a simple "alphabetic" equivalence between 
the primary structures of protein and nucleic acid which specifies a direct amino acid to 
base relationship. 

15 International patent application WO 96/06166 addresses this issue and provides a 
"syllabic" code that explains protein-DNA interactions for zinc finger nucleic acid 
binding proteins. A syllabic code is a code that relies on more than one feature of the 
binding protein to specify binding to a particular base, the features being combinable in 
the forms of "syllables", or complex instructions, to define each specific contact. Segal, 

20 D. J., Dreier, B., Beerli, R. R. & Barbas, C. F. (1999) Proc. Natl. Acad. Sci. USA 96, 
2758-2763 present a method of constructing zinc fingers polypeptides, based on 16 
individual zinc finger domains which bind sequences of the form 5'-GXX-3', where X is 
any base. See also U.S. Patent No. 6,140,081. The latter method has the severe 
limitation that it does not provide instructions permitting the specific targeting of triplets 

25 containing nucleotides other than G in the 5' position of each triplet, which greatly 
restricts the potential target sequences of such generated zinc finger peptides. 

International patent application WO98/53057 addresses the above problems by 
recognizing that zinc fingers can specify overlapping 4 bp subsites, and therefore synergy 
30 between adjacent zinc finger domains is an important consideration in selecting zinc 
finger nucleic acid-binding domains to specifically target any sequence. 
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With the recent completion of the human genome project and the rapidly advancing fields 
of transgenic animals and plants, thousands of uncharacterised (and characterised) genes 
have (and will) become valid targets for functional genomics and other such projects. 
Concomitantly, 'designer 5 zinc finger peptides are emerging as one of the most universal 
5 and desirable ways of regulating the expression of specific genes within cells. See, for 
example, Choo, Y., Sanchez-Garcia, I. & Klug, A. (1994) Nature 372: 642-645; Beerli, 
R. R., Dreier, B. & Barbas, C. F. Ill (2000) Proc. Natl Acad. Sci. USA 97: 1495-1500; 
Kim, J-S. & Pabo, C. O. (1998) Proc. Natl Acad. Set USA 95: 2812-2817; Kang, J. S. & 
Kim, J-S. (2000) J. Biol Chem. 275: 8742-8748); Zhang et al (2000) J. Biol. Chem. 
10 275:33,850-33,860; lAxxetal (2001) J. Biol Chem. 276:11,323-11,334; and Ren a/. 
(2002) Genes. Devel 16:27-32. See also WO 00/41566 and WO 01/19981. Hence, a 

» 

rapid method of creating multi-zinc finger peptides for the up- or down-regulation of any 
specific gene is highly desirable. 

As stated above, synergy between adjacent zinc finger peptides is an important factor in 
15 specific DNA recognition. Moreover, the findings reported in co-owned WO 01/53480, 
which is hereby incorporated by reference, demonstrate that poly-zinc finger peptides 
constructed from strings of 2-finger domains can provide greater DNA binding 
specificity. 

20 Traditional strategies of zinc finger mutagenesis and selection, such as phage display, 
particularly if employed for the selection of 2-zinc finger units to target any desired 
binding site are limited by the size of the library that can be cloned into host/vector 
systems, such as phage. Due to limitations in library size imposed by such constraints, it 
is impossible to include an exhaustive combination of randomisations to cover all 

25 potentially important sequence-space. Furthermore, for important applications of 

engineered zinc finger peptides, such as for gene therapy or transgenic animal systems, 
engineered zinc finger peptides run the significant risk of eliciting a harmful 
immunological reaction in the host animal. 

30 The human genome sequencing project has also revealed the presence of almost 700 
endogenous zinc finger-containing proteins. Assuming that each of these proteins 
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contains at least 2 finger modules, there are probably at least 2,000 natural zinc finger 
modules in the human genome alone. Similar numbers are expected in other animal and 
plant genomes. 

5 SUMMARY 

The present invention recognises the potential importance of designer zinc finger peptides 
in therapeutic and transgenic applications in animals and plants. Furthermore the present 
invention acknowledges that the safety of such applications is of primary importance. 

10 

The present invention provides the isolation of natural zinc finger modules, from 
genomes such as human, mouse, chicken, arabidopsis and other species, and the 
• construction of non-natural combinations of such zinc finger modules, to create multi- 
finger domains, and to provide and determine novel nucleic acid binding specificities. 

1 5 Such a procedure will allow the identification of the novel zinc finger domains that bind 
any desired nucleic acid sequence, particularly sequences of between 6 and 10 
nucleotides long. The first advantage of such technology is that millions of years of 
natural evolution, to create specific nucleotide-binding zinc finger modules, are captured 
to create novel nucleic acid-binding domains. Also, use of poly-zinc finger peptides 

20 constructed from such units for targeted gene regulation avoids the potentially harmful 
effects of host immune responses. The present invention thus greatly enhances the 
possibilities for the use of zinc finger transcription factors for in vivo applications, such as 
gene therapy and transgenic animals. 

25 In a first aspect, therefore, there is provided a composite binding polypeptide comprising 
a first natural binding domain derived from first natural binding polypeptide, and a 
second natural binding domain derived from a second natural binding polypeptide, 
wherein said first and second natural binding polypeptides may be the same or different; 
which polypeptide binds to a target, said target differing from the natural target of the 

30 both the first and the second binding polypeptides. 

Preferably, said first and second natural binding polypeptides are different polypeptides. 
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Binding polypeptides according to the invention comprise two or more natural binding 
domains, advantageously three or more natural binding domains; advantageously, six or 
more domains are included. These are preferably arranged in a 3x2 conformation, 
5 separated by linker sequences. 

The binding domains are preferably nucleic acid binding domains, and the composite 
polypeptide is preferably a nucleic acid binding polypeptide. Most preferably, the 
composite polypeptide is a zinc finger polypeptide, and the natural binding domains are 
1 0 zinc finger domains . 

Zinc finger binding domains can comprise any type of zinc finger or zinc- coordinated 
structure including, but not limited to, Cys2-His2 (SEQ ID NO:l) zinc finger binding 
domain or Cys3-His (SEQ ID NO:2) zinc finger binding domains. 

15 

In a further aspect, there is provided a library of natural binding domains. The natural 
binding domains are the domains that may be assembled into polypeptides according to 
the previous aspect of the invention. Preferably, the library is of natural zinc finger 
nucleic acid binding domains. 

20 

Said zinc finger domains may comprise a linker attached thereto. Any linker amino acid 
sequence known in the art can be used. Advantageously, the linker comprises the amino 
acid sequence TGEKP (SEQ ID NO:3). 

25 In a further aspect, the invention provides a method for selecting a binding polypeptide 
capable of binding to a target site, comprising: 

(a) providing a library of natural binding domains; 

(b) assembling two or more of said domains to form a composite polypeptide; 

(c) screening said composite polypeptide against the target site in order to 
30 determine its ability to bind the target site. 

Preferably, the natural binding domains are zinc finger binding domains. 
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Furthermore, the invention provides methods for designing a composite binding 
polypeptide, comprising: 

(a) providing information defining a target site; 

(b) selecting, from a database of natural binding domains, a sequence of binding 
domains, separated by linker sequences, which is predicted to bind to the target site; 

(c) displaying the sequence of binding domains and linkers and optionally 
assembling the binding polypeptide from a library of said domains. 

In certain embodiments, the binding domains are zinc finger domains. In certain 
embodiments, a binding domain sequence that will bind a particular target site is 
predicted by the application of one or more rules that define target binding interactions 
for the binding domains. In additional embodiments, a nucleotide sequence encoding the 
binding domains is assembled and introduced into a cell such that the composite binding 
polypeptide is expressed. 

In one embodiment, zinc fingers can be considered to bind to a nucleic acid triplet, in 
which case domains can be selected according to one or more of the following rules: 

(a) if the 5' base in the triplet is G, then position +6 in the a-helix is Arg; or 
position +6 is Ser or Thr and position ++2 is Asp; 

(b) if the 5' base in the triplet is A, then position +6 in the a-helix is Gin and ++2 
is not Asp; 

(c) if the 5' base in the triplet is T, then position +6 in the a-helix is Ser or Thr 
and position ++2 is Asp; 

(d) if the 5' base in the triplet is C, then position +6 in the a-helix may be any 
amino acid, provided that position ++2 in the a-helix is not Asp; 

(e) if the central base in the triplet is G, then position +3 in the a-helix is His; 

(f) if the central base in the triplet is A, then position +3 in the a-helix is Asn; 

(g) if the central base in the triplet is T, then position +3 in the a-helix is Ala, Ser 
or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 



WO 02/099084 



PCT/US02/22272 



7 

(h) if the central base in the triplet is C, then position +3 in the a-helix is Ser 5 Asp, 
Glu, Leu, Thr or Val; 

(i) if the V base in the triplet is G, then position -1 in the a-helix is Arg; 
(j) if the 3' base in the triplet is A, then position -1 in the a-helix is Gin; 

5 (k) if the 3 ' base in the triplet is T, then position -1 in the a-helix is Asn or Gin; 

(1) if the 3' base in the triplet is C, then position -1 in the a-helix is Asp. 

In a further embodiment, the zinc fingers can be considered to bind to a nucleic acid 
quadruplet and domains can be selected according to one or more of the following rules: 
10 (a) if base 4 in the quadruplet is G, then position +6 in the a-helix is Arg or Lys; 

(b) if base 4 in the quadruplet is A, then position +6 in the a-helix is Glu, Asn or 

Val; 

(c) if base 4 in the quadruplet is T, then position +6 in the a-helix is Ser, Thr, Val 

or Lys; 

15 (d) if base 4 in the quadruplet is C, then position +6 in the a-helix is Ser, Thr, Val, 

Ala, Glu or Asn; 

(e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 

(f) if base 3 in the quadruplet is A, then position +3 in the a-helix is Asn; 

(g) if base 3 in the quadruplet is T, then position +3 in the a-helix is Ala, Ser or 
20 Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 

(h) if base 3 in the quadruplet is C, then position +3 in the a-helix is Ser, Asp, 
Glu, Leu, Thr or Val; 

(i) if base 2 in the quadruplet is G, then position -1 in the a-helix is Arg; 
(j) if base 2 in the quadruplet is A, then position -1 in the a-helix is Gin; 

25 (k) if base 2 in the quadruplet is T, then position -1 in the a-helix is His or Thr; 

(1) if base 2 in the quadruplet is C, then position -1 in the a-helix is Asp or His; 

(m) if base 1 in the quadruplet is G, then position +2 is Glu; 

(n) if base 1 in the quadruplet is A, then position +2 Arg or Gin; 

(o) if base 1 in the quadruplet is C, then position +2 is Asn, Gin, Arg, His or Lys; 
30 (p) if base 1 in the quadruplet is T, then position +2 is Ser or Thr. 
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In a preferred embodiment, zinc fingers are considered to bind to a nucleic acid 
quadruplet and domains are selected according to one or more of the following rules: 

(a) if base 4 in the quadruplet is G, then position +6 in the a-helix is Arg; or 
position +6 is Ser or Thr and position ++2 is Asp; 
5 (b) if base 4 in the quadruplet is A, then position +6 in the oc~helix is Gin and ++2 

is not Asp; 

(c) if base 4 in the quadruplet is T, then position +6 in the a-helix is Ser or Thr 
and position ++2 is Asp; 

(d) if base 4 in the quadruplet is C, then position +6 in the a-helix may be any 
10 amino acid, provided that position ++2 in the a-helix is not Asp; 

(e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 

(f) if base 3 in the quadruplet is A, then position +3 in the a-helix is Asn; 

(g) if base 3 in the quadruplet is T, then position +3 in the a-helix is Ala, Ser or 
Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 

15 (h) if base 3 in the quadruplet is C, then position +3 in the a-helix is Ser, Asp, 

Glu, Leu, Thr or Val; 

(i) if base 2 in the quadruplet is G, then position -1 in the a-helix is Arg; 

(j) if base 2 in the quadruplet is A, then position -1 in the a-helix is Gin; 

(k) if base 2 in the quadruplet is T, then position -1 in the a-helix is Asn or Gin; 
20 (1) if base 2 in the quadruplet is C, then position -1 in the a-helix is Asp; 

(m) if base 1 in the quadruplet is G, then position +2 is Asp; 

(n) if base 1 in the quadruplet is A, then position +2 is not Asp; 

(o) if base 1 in the quadruplet is C, then position +2 is not Asp; 

(p) if base 1 in the quadruplet is T, then position +2 is Ser or Thr. 

25 

Two or more composite polypeptides comprising two or more domains which are 
selected for binding to two or more target sites can be combined to provide a composite 
polypeptide which binds to an aggregate binding site comprising the two or more target 
binding sites. 



30 
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111 a still further aspect, the invention provides a computer-implemented method for 
designing a zinc finger polypeptide that binds to a target nucleic acid sequence, 
comprising the steps of: 

(a) providing a system comprising at least storage means for storing data relating 
5 to a library of zinc fingers; storage means for storing a rule table; means for inputting 

target nucleic acid sequence data; processing means for generating a result; and means for 
outputting the result; 

(b) inputing sequence data for a target nucleic acid molecule; 

(c) defining a first target zinc finger binding site in said nucleic acid molecule; 
10 (d) interrogating the zinc finger library and rule table storage means, comparing 

zinc fingers to the target zinc finger binding site according to the rule table and selecting 
zinc finger data identifying a zinc finger capable of binding to said target site; 

(e) defining at least one further target zinc finger binding site and repeating step 
(d); and 

15 (f) outputting the selected zinc finger data. 

Such a method may further comprise sending instructions to an automated chemical 
synthesis system to assemble a zinc finger polypeptide as defined by the zinc finger data 
obtained in (f). 

20 

In additional embodiments, the sequence of one or more oligonucleotides encoding a 
composite binding polypeptide can be determined from the sequence of a composite 
binding polypeptide, and the one or more oligonucleotides can be synthesized by any 
number of well-known methods. 

25 

Preferably, a composite binding polypeptide is tested for binding to a target sequence, and 
data from said testing is used to select, from a plurality of possibilities, a composite 
binding polypeptide that binds with optimal affinity and specificity to the target site. 

30 Advantageously, two or more zinc finger polypeptides are combined to form a zinc finger 
polypeptide capable of binding to an aggregate binding site comprising two or more 
target sites. 
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The rule table preferably comprises rules as set forth above. 

BRIEF DESCRIPTION OF THE FIGURES 

5 Figure 1 shows a flowchart depicting part of the logic used in the selection of zinc 
fingers from a natural library in accordance with the invention. The logic set forth in 
Figure 1 may be supplemented, for example using Rules relating to zinc finger overlap. 
Functional testing of zinc fingers for binding to the desired binding site may be 
implemented in an automated fashion and integrated with the zinc finger design system. 

10 

Figure 2 is a schematic representation of the human zinc finger mini-library construction 
procedure. Synthetic zinc finger coding oligonucleotides are assembled into full-length 
ds expression constructs by overlap PCR. 

1 5 Figure 3 is a schematic representation of the fluorescent ELIS A assay used to detect zinc 
finger peptides bound to double stranded DNA target sites. Streptavidin (7), biotinylated 
DNA target (5) linked to bio tin (6), 3 -finger peptide (4) fused to HA- tag (3), anti-HA 
antibody (2) fused to horseradish peroxidase (HRP, 1). 

20 Figure 4 depicts ELIS A scores of 384 library 2 constructs screened against the 5'~GCG- 
TGG-GCG-3' (SEQ ID NO:4) target site. Six constructs showed significant binding, and 
are termed C8, G16 5 119, 123, J19 and K19, according to their coordinates on the 384-well 
plate. 

25 Figure 5 depicts ELISA scores of selected library 2 members; B10, C8, G16, 123, J19, 
and K19, against different DNA target sites. The sequences of the target sites are (from 
back of graph to front): 5'-GCG-TGG-GCG-3 5 (SEQ ID NO:5) ; 5'-CCA~CTC-GGC-3 5 
(SEQ ID NO:6); 5 ? -CCT-AGG-GGG-3'(SEQ ID NO:7); 5'-GGA-TAA-GCG-3' (SEQ 
ID NO:8); 5'-GGG-AGG-CCT-3' (SEQ ID NO:9); 5'-GCG-TAA-GGA-3' (SEQ ID 

30 NO:10); 5'-GCG-GGG-GGA-3' (SEQ ID NO:ll); and no DNA control (front row). 
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Figure 6 depicts a schematic representation of the 3 -zinc finger library constructed 
according to the procedure described in Example 2. 

DETAILED DESCRIPTION 

5 

Definitions 

Unless defined otherwise, all technical and scientific terms used herein have the same 
meaning as commonly understood by one of ordinary skill in the art (e.g., in cell culture, 

10 molecular genetics, nucleic acid chemistry, hybridisation techniques and biochemistry). 
The practice of the present invention will employ, unless otherwise indicated, 
conventional techniques of chemistry, molecular biology, microbiology, recombinant 
DNA, immunology, chemical methods, pharmaceutical formulations and delivery and 
treatment of patients, which are within the capabilities of a person of ordinary skill in the 

15 art. Such techniques are explained in the literature. See, for example, J. Sambrook, E. R 
Fritsch, and T. Maniatis, 1989, Molecular Cloning: A Laboratory Manual, Second 
Edition, Books 1-3, Cold Spring Harbor Laboratory Press; Ausubel, F. M. et al. (1995 
and periodic supplements; Current Protocols in Molecular Biology, ch. 9, 13, and 16, 
John Wiley & Sons, New York, N.Y.); B. Roe, J. Crabtree, and A. Kahn, 1996, DNA 

20 Isolation and Sequencing: Essential Techniques, John Wiley & Sons; J. M. Polak and 
James O'D. McGee, 1990, In Situ Hybridisation: Principles and Practice,', Oxford 
University Press; M. J. Gait (Editor), 1984, Oligonucleotide Synthesis: A Practical 
Approach, IRL Press; and, D. M. J. Lilley and J. E. Dahlberg, 1992, Methods of 
Enzymology: DNA Structure Part A: Synthesis and Physical Analysis of DNA Methods in 

25 Enzymology, Academic Press. Each of these general texts is herein incorporated by 
reference. 

The term "library" is used according to its common usage in the art, to denote a collection 
of different polypeptides or, preferably, a collection of nucleic acids encoding different 
30 polypeptides. The libraries of natural zinc finger peptides referred to herein comprise or 
encode a repertoire of polypeptides of different sequences, each of which has a preferred 
binding sequence. 
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The terms "polypeptide", "peptide" and "protein" are used interchangeably to refer to a 
polymer of amino acid residues, preferably including naturally occurring amino acid 
residues. Artificial amino acid residues are also within the scope of the invention, but the 
5 exclusive use of naturally-occurring amino acids is preferred in order to maintain the 

natural nature of the binding domains. There are 20 common amino acids, each specified 
by a different arrangement of three adjacent DNA nucleotides by the genetic code. These 
are the building blocks of proteins. Joined together in a strictly ordered chain by peptide 
bonds, the sequence of amino acids determines each polypeptide molecule. The 20 
10 common amino acids are: alanine, arginine, aspartic acid, glutamic acid, glutamine, 
glycine, histidine, isoleucine, leucine, phenylalanine, proline, serine, threonine, 
tryptophan, tyrosine, valine, cysteine, methionine, lysine, and asparagine. Virtually all of 
these amino acids (except glycine) possess an asymmetric carbon atom, and thus are 
potentially chiral in nature. 

15 

As used herein, "nucleic acid" includes both RNA and DNA, and nucleic acids 
constructed from natural nucleic acid bases or synthetic bases, or mixtures thereof. 
Modified nucleic acids such as, for example, PNAs and morpholino nucleic acids, are 
also included in this definition. 

20 

A "gene", as used herein, is the segment of nucleic acid (typically DNA) that is involved 
in producing a polypeptide chain or ribonucleic acid gene product. It includes regions 
preceding and following the coding region (leader and trailer) as well as intervening 
sequences (introns) between individual coding segments (exons). Preferably, "gene" 
25 includes the necessary control sequences for gene expression, as well as the coding region 
encoding the gene product. 

A "binding polypeptide" is a polypeptide capable of binding to a specific target. 
Although, as is well known, polypeptides are capable of non-specific binding to a wide 
30 range of substrates, it is also known that certain polypeptides, such as antibodies and 
other members of the immunoglobulin superfamily, zinc fingers, leucine zipper 
polypeptides, peptide aptamers and the like can bind specifically to target sites or 
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molecules. Generally, specific binding is preferably achieved with a dissociation constant 
(Kd) of IOOjoM or lower; preferably lOfxM or better; preferably 1|llM or better; and ideally 
0.5pM or better. Binding polypeptides can be nucleic acid binding polypeptides which 
bind to nucleic acid in a target sequence-specific manner, such as zinc finger 
5 polypeptides. Unless specifically noted, no difference is intended herein between terms 
such as "peptide", "polypeptide" and "protein". 

A "natural binding polypeptide" is a binding polypeptide encoded by the genome of a 
living organism such as, for example, a plant or animal. 

10 

A "composite" polypeptide is a polypeptide that is assembled from a plurality of 
components. In a preferred embodiment, the invention provides composite binding 
polypeptides that are assembled from a plurality of individual natural binding domains as 
set forth in detail herein. Typically, such domains are zinc finger nucleic acid binding 
15 domains. 

A "natural binding domain" (or module) is a domain of a naturally occurring polypeptide 
that is capable of specific binding to a target as defined above. The terms "domain" and 
"module", according to their ordinary signification in the art, refer to a discrete 
20 continuous part of the amino acid sequence of a polypeptide that can be equated with a 
particular function. Protein domains or modules are largely structurally independent and 
can retain their structure and function in different environments. In certain embodiments, 
a natural binding domain or module is a zinc finger that binds a triplet or quadruplet 
nucleotide sequence. 

25 

Preferably, each of the individual natural binding domains that make up a composite 
binding polypeptide contain no changes in sequence, as compared to the natural 
sequence. However, those skilled in the art will understand that certain changes including 
conservative amino acid substitutions, as well as additions or deletions, may be made 
30 without altering the function of a domain. Moreover, where the changes are consistent 
with sequences common to the species from which the domain is derived, such as for 
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example being present in consensus sequences, they are unlikely to give rise to 
immunological problems. 

Conservative amino acid substitutions may be made, for example according to Table 1 . 
5 Amino acids in the same block in the second column and preferably in the same line in 
the third column may be substituted for one another: 
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Table 1 



ALIPHATIC 


ISTon -v) n 1 a v 


GAP 

\ 1 -/VI 






IL V 




Polar - uncharged 


CSTM 






NQ 




Polar - charged 


DE 






KR 


AROMATIC 




HF WY 



A domain is "derived" from a protein if it is effectively removed from a naturally- 
occurring protein for use in a composite binding polypeptide. Removal may be physical 
5 removal, by cleavage of the protein; more commonly, however, the sequence of the 

domain is determined and the domain is synthesised by protein synthesis techniques to be 
a copy of the naturally-occurring domain. Alternatively, a nucleic acid encoding the 
domain is synthesized and expressed in a cell. In vitro synthesised domains, or in vitro 
synthesized polynucleotides encoding naturally-occurring domains, are considered to be 
10 "derived" from the natural protein if they recapitulate the sequence of the naturally- 
occurring domain. 

A "target" is a molecule or part thereof to which a binding polypeptide or a binding 
doamin is capable of specific binding. The "natural target" of a binding polypeptide is 
15 the target to which that polypeptide binds in nature; e.g., in a living cell. In the case of 
zinc finger polypeptides, for instance, the natural target is the nucleotide sequence to 
which the polypeptide binds in a living cell. Sequences other than the natural target, as 
defined herein, to which a zinc finger polypeptide may bind in vitro are not natural 
targets. 

20 

In the case of nucleic acid binding polypeptides, therefore, the term "target" may be 
substituted or supplemented with "binding site" or "binding sequence." Where binding 
sites are assembled to form larger binding sites, which are bound by multi-domain 
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binding polypeptides, such binding sites are referred to as "aggregate binding sites", 
indicating that they are formed by the juxtaposition of two or more individual binding 
sites. The aggregate binding sites can comprise contiguous individual binding sites, or 
individual binding sites interspersed by one or more intervening nucleotides or sequence 
5 of nucleotides. 

The present invention relates to naturally-occurring zinc fingers and their use as specific 
nucleic acid binding modules in combinations not present in nature. This invention 
provides methods of determining and/or predicting the nucleotide binding specificities of 

10 natural zinc finger modules. Also provided are methods of constructing poly-zinc finger 
peptides containing at least one natural zinc finger module, from libraries of natural zinc 
finger peptides, and methods of screening such peptides to determine their preferred 
nucleotide binding specificity. Moreover, the invention provides for the use of 
combinations of such natural zinc finger modules in poly-zinc finger peptides not present 

15 in nature, to bind any desired nucleotide sequence. 

Poly-zinc finger peptides of this invention may contain 2, 3, 4, 5, 6 or more zinc finger 
modules. Natural zinc finger modules of this invention may preferably be linked by 
canonical, flexible or structured linkers, as set out below and in WO 01/53480, the 
disclosure of which is hereby incorporated by reference. More preferably, the linkers are 
20 canonical linkers such as -TGEKP- (SEQ ID NO:3). 

The poly-zinc finger peptides of this invention can be given useful biological functions by 
the addition of effector domains, creating chimeric zinc finger peptides. Preferably, such 
chimeric zinc finger peptides may be used to up- or down-regulate desired genes, in vitro 

25 or in vivo. Preferable effector domains include transcriptional repressor domains, 
transcriptional activator domains, transcriptional insulator domains, chromatin 
remodelling domains, enzymatic domains, and signalling / targeting sequences or 
domains. To cause a desired biological effect composite binding polypeptides can bind to 
one or more suitable nucleotide sequences in vivo or in vitro. Preferred DNA regions 

30 from which to effect the up- or down-regulation of specific genes include promoters, 
enhancers or locus control regions (LCRs). Other suitable regions within genomes, 
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which may provide useful targets for composite binding polypeptides include telomeres 
and centromeres. 

The expression of many genes is also achieved by controlling the fate of the associated 
5 RNA transcript. RNA molecules often contain sites for RNA-binding proteins, which 
determine RNA half-life. Hence, composite binding polypeptides can also control 
endogenous gene expression by specifically targeting RNA transcripts to either increase 
or decrease their half-life within a cell. 

10 Composite binding polypeptides can also be fused to epitope tags, which can be detected 
by antibodies, and may therefore be used to signal the presence or location of a particular 
nucleotide sequence in a mixed pool of nucleic acids, or immobilised on the surface of a 
chip or other such surface. 

15 Intracellular localization of composite binding polypeptides can be regulated, for 

example, by fusion to a localization domain, for example, a nuclear localization sequence 
or a localization domain as disclosed, for example, in PCT/US01/42377. 

a. Nucleic Acid Binding Polypeptides 

20 

This invention preferably relates to nucleic acid binding polypeptides. Preferably, the 
binding polypeptides of the invention are DNA binding polypeptides. Particularly 
preferred examples of nucleic acid binding polypeptides are zinc finger peptides. 

25 Zinc finger peptides typically contain strings of small nucleic acid binding domains, each 
stabilised by the co-ordination of zinc. These individual domains are also referred to as 
"fingers" and "modules". A zinc finger recognises and binds to a nucleic acid triplet, or 
an overlapping quadruplet, in a DNA target sequence. However, zinc fingers are also 
known to bind RNA and proteins. Clemens, K. R. et aL 9 (1993) Science 260: 530-533; 

30 Bogenhagen, D.F. (1993) Mol Cell. Biol 13: 5149-5158; Searles, M. A. et al, J. Mol 
Biol 301: 47-60 (2000); Mackay, J. P. & Crossley, M. (1998) Trends Biochem. Set 23: 
1-4. 
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Preferably, there are 2 or more zinc fingers, for example 2 5 3, 4, 5, 6, or 7 zinc fingers, in 
each zinc finger polypeptide. Advantageously, there are 3 or more zinc fingers in each 
zinc finger polypeptide. 

5 

All of the DNA binding residue positions of zinc finger peptides, as referred to herein, are 
numbered from the first residue in the a-helix of the finger, ranging from +1 to +9. "-1" 
refers to the residue in the framework structure immediately preceding the a-helix in a 
zinc finger peptide. Residues referred to as "++" are residues present in an adjacent 
10 (C-terminal) peptide. Where there is no C-terminal adjacent peptide, "++" interactions do 
not operate. 

The a-helix of a zinc finger peptide aligns antiparallel to the target nucleic acid strand, 
such that the primary nucleic acid sequence is arranged 3' to 5' in order to correspond 

15 with the N- terminal to C-terminal sequence of the zinc finger peptide. Since nucleic acid 
sequences are conventionally written 5' to 3', and amino acid sequences N-terminus to 
C-terminus, the result is that when a target nucleic acid sequence and a zinc finger 
peptide are aligned according to convention, the primary interaction of the zinc finger 
peptide is with the "minus" strand of the nucleic acid sequence, since it is this strand 

20 which is aligned 3 ' to 5 5 . These conventions are followed in the nomenclature used 

herein. It should be noted, however, that in nature certain zinc finger modules, such as 
zinc finger 4 of the protein GLI, bind to the "plus" strand of the nucleic acid sequence. 
See Suzuki et al (1994) Nucl Acids Rev. 22: 3397-3405; and Pavletich & Pabo, (1993) 
Science 261: 1701-1 707 . The present invention encompasses incorporation of such zinc 

25 finger peptides into DNA binding molecules. 

Natural Zinc Finger Peptides. 

In certain embodiments, this invention relates to natural zinc finger modules. As used 
30 herein, the term 'natural' with reference to a zinc finger, means that the DNA sequence 
which encodes a particular zinc finger, whether normally expressed in vivo or not, is 
found in nature, i.e. is part of the genome of a cell. A natural human zinc finger is one 
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which is endogenous to the human genome, a natural mouse zinc finger is found in the 
mouse genome, and a natural viral zinc finger is found in a viral genome, etc. Natural 
zinc finger genes which have become integrated into the genome of a heterologous 
species by natural means, e.g., integration of a viral genome into a host genome, are 
5 considered to be endogenous to the host species within the context of this disclosure. A 
zinc finger module constructed or produced in vitro or extracted from an in vivo source is 
considered to be natural if its amino acid sequence matches that of the amino acid 
sequence encoded by its natural gene. The DNA sequence of the natural gene is not the 
defining aspect. Thus, polynucleotides encoding natural zinc finger modules may have a 
10 different sequence from that of the naturally-occurring sequence encoding the module, 

e.g., to adjust codon usage to optimise expression of the module in a particular expression 
system. 



Preferably, sequences of zinc fingers used in the present invention are not mutated from 
15 their natural form. Advantageously, the natural zinc finger polypeptides are expressed in 
nature. 



A natural zinc finger binding motif is a structure well known to those in the art and 
defined in, for example, Miller et aL, (1985) EMBO J. 4: 1609-1614; Berg (1988) Proc. 
20 Natl Acad. Sci. USA 85: 99-102; Lee et aL, (1989) Science 245: 635-637; see also 

International patent applications WO 96/06166 and WO 96/32475, incorporated herein by 
reference. 



In general, a natural zinc finger framework has the structure: 
25 SEQ ID NO: 12 X 0 _ 2 C X^ s C X 9 _ 14 H X 3 _ 6 H /c 

where X is any amino acid, and the numbers in subscript indicate the possible numbers of 
residues represented by X (Formula A). 



In a preferred aspect of the present invention, natural zinc finger nucleic acid binding 
30 motifs may be represented as motifs having the following primary structure: 

X 0 -2 C Xi-5 C X 2 -7 XXXXXXXH X 3 _ 6 H / C (SEQ ID NO: 14) 

(SEQ ID NO: 13) 
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-1 1234567 

where X is any amino acid, and the numbers in subscript indicate the possible numbers of 
residues represented by X (Formula A ? ). The numbers —1 through 7 refer to amino acid 
position with respect to the beginning of the alpha-helical region of the zinc finger. 

5 The Cys and His residues, which together co-ordinate the zinc metal atom, are marked in 
bold text and are usually invariant. However, all naturally-occurring zinc finger modules, 
even if they diverge from the above formula, are encompassed within the scope of this 
invention. 

10 Zinc finger modules of formula A' are often arranged in tandem within a natural zinc 

finger polypeptide, such that a zinc finger containing protein may have 2, 3, 4, 5, 6, 7, 8, 
9 or more individual zinc finger motifs. In such a protein, individual zinc fingers are 
joined to each other by a polypeptide sequence known as a linker. Generally, such a 
natural linker lacks secondary structure, although the amino acids within the linker may 

15 form local interactions when the protein is bound to its target site. By 'linker sequence 5 is 
meant an amino acid sequence that links together adjacent zinc finger modules. For 
example, in a natural zinc finger protein, the linker sequence is the amino acid sequence 
which lies between the last residue of the a-helix in a zinc finger and the first residue of 

'7 

the P~ sheet in the next zinc finger. The linker sequence therefore joins together two zinc 
20 fingers. For the purposes of the present invention, the last amino acid of the a-helix in a 

zinc finger is considered to be the final zinc coordinating histidine (or cysteine) residue, 

i 

while the first amino acid of the following finger is generally a tyrosine / phenylalanine or 
another hydrophobic residue. Since some natural zinc fingers do not start with a 
hydrophobic residue (see Appendices), the start of a finger is sometimes harder to define 

25 from amino acid sequence (or indeed zinc finger structure), and so some flexibility must 
be allowed in this definition. Accordingly, in a natural zinc finger protein, threonine is 
often considered to be the first residue in the linker, and proline is the last residue of the 
linker. Thus, for example, in the natural Zif268 peptide the linker sequence is - 
TG(E/Q)(K/R)P- (SEQ ID NO: 15). Although natural linkers can vary greatly in terms of 

30 amino acid sequence and length, on the basis of sequence homology, the canonical 
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natural linker sequence is considered to be -TGEKP- (SEQ ID NO:3). Hence, the 
preferred linker sequence to join zinc finger modules of the present invention is 
-TGEKP- . 

5 Additionally, a 'leader' peptide may be added to the N-terminal zinc finger of a poly-zinc 
finger peptide to aid its expression, without changing the sequence of the natural zinc 
finger module. Preferably, the leader peptide is MAEERP (SEQ ID NO: 16) or MAERP 
(SEQ ID NO: 17). 

10 In general, naturally occurring zinc finger modules may be selected from those proteins 
for which the DNA binding specificity is already known. For example, these may be the 
proteins for which a crystal structure has been resolved: namely Zif268 (Elrod-Erickson 
et al (1996) Structure 4: 1171-1180), GLI (Pavletich & Pabo (1993) Science 261: 
1701-1707), Tramtrack (Fairall et al (1993) Nature 366: 483-487) and YY1 (Houbaviy et 

15 al (1996) Proc. Natl Acad. Set USA 93: 13577-13582). Furthermore, the sequence 

specificity of many naturally-occurring zinc fingers and zinc finger proteins are known. 
In addition, this invention further provides for the determination of the binding specificity 
of natural zinc finger modules for use in the present invention. See "Prediction of 
Binding Specificity," infra. 

20 

Poly-Zinc Finger Peptides. 

It is desirable that a 'designer' transcription factor for uses such as gene therapy 
and in transgenic organisms should have the ability to target virtually unique sites within 
any genome. For complex genomes such as in humans, an address of at least 16 bps is 

25 required to specify a potentially unique DNA sequence. Shorter DNA sequences have a 
significant probability of appearing several times in a genome, raising the possibility of 
obtaining undesirable non-specific gene targeting with a designed transcription factor 
targeted to such a shorter sequence. As individual zinc fingers only bind 3 to 4 
nucleotides, it is therefore necessary to construct multi-finger polypeptides to target these 

30 longer sequences. A six-zinc finger peptide (with an 18 bp recognition sequence) could, 
in theory, be used for the specific recognition of a single target site and hence, the 
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specific regulation of a single gene within any genome. In addition, a significant increase 
in binding affinity might also be expected, compared to a protein with fewer fingers. In 
simple terms, if a three-finger peptide (with a 9 bp recognition sequence) binds DNA with 
nanomolar affinity, two tandemly linked three-finger peptides might be expected to bind 
5 an 18 bp sequence with an affinity of 10~ 15 -lCf 18 M. However, most previous attempts at 
producing high-affinity 6-fmger peptides (poly-zinc finger peptides) based on fusions of 
two 3 -finger domains have been unsuccessful in generating much of an improvement in 
affinity over 3-fmger peptides. Liu, Q., Segal, D. J., Ghiara, J. B. & Barbas, C. F. Ill 
(1997) Proc. Natl. Acad Set USA 94: 5525-5530; Kim, J-S. & Pabo, C. O. (1998) Proc. 

10 Natl Acad. Set USA 95: 2812-2817; Kamiuchi, T., Abe, E., Imanishi, M., Kaji, T., 

Nagaoka, M. & Sugiura, Y. (1998) Biochemistry 37: 13827-13834. To optimise both the 
affinity and specificity of 6-finger peptides, a fusion of three 2-finger domains has been 
shown to be advantageous. Moore, M., Klug, A. & Choo, Y. (2001) Proc. Natl. Acad. 
Set. USA 98: 1437-1441; and WO 01/53480. Therefore, in one embodiment, 2-finger 

15 units are linked to make poly-zinc finger nucleotide-binding domains. A pool of 4096 

such 2-fmger units, that recognise all possible 6 bp sequences (4^=4096), represents an 
archive sufficient to rapidly create universal nucleic acid recognition, by simple linkage, 
in an "off-the-shelf manner. See Moore et al., supra and WO 01/53480. 



20 Poly-zinc finger peptides according to this invention may be constructed 

containing 2, 3, 4, 5, 6 or more zinc finger modules. Such poly-zinc finger peptides may 
contain inter- finger linkers other than the canonical (TGEKP) linker sequence, as 
described, for example, in WO 01/53479; Moore, M., Choo, Y. & Klug, A. (2001) Proc. 
Natl. Acad. Set USA 98: 1432-1436; and Moore, M., Klug, A. & Choo, Y. (2001) Proc. 

25 Natl. Acad Sci. USA 98: 1437-1441. Briefly, linker sequences maybe flexible or 
structured but, in general, will not form base-specific interactions with the target 
nucleotide sequence. A 'flexible' linker is defined as one which does not form a specific 
secondary structure in solution, whereas a 'structured' linker is defined as one that adopts 
a particular secondary structure in solution. Preferably, flexible linkers include the 

30 sequences GGERP (SEQ ID NO:18), GSERP (SEQ ID NO:19), GGGGSERP (SEQ ID 
NO:20), GGGGSGGSERP (SEQ ID NO:21), GGGGSGGSGGSERP (SEQ ID NO:22), 
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GGGGS GGS GGS GGS GGS ERP (SEQ ID NO:23). Preferably, the structured linker 
comprises an amino acid sequence that is not capable of specifically binding nucleic acid. 
More preferably, the structured linker comprises the amino acid sequence of TFIIIA 
finger IV. Alternatively, or in addition, the structured linker is derived from a zinc finger 
5 by mutation of one or more of its base contacting residues to reduce or abolish nucleic 
acid binding activity of the zinc finger. The zinc finger may be finger 2 of wild type 
Zif268 mutated at positions -1, 2, 3 and/or 6. 

In one embodiment, this invention provides for the construction and screening of poly- 
10 zinc finger peptides containing at least one natural zinc finger module. 

In another embodiment, this invention provides for the construction and screening of 
poly-zinc finger peptides containing at least one natural zinc finger module, linked with 
the canonical linker sequence -TGEKP- (SEQ ID NO:3). 

15 

In one embodiment, methods for the construction and use of poly-zinc finger peptide 
comprising natural zinc finger modules are provided. 

In another embodiment, methods for the construction and use of poly-zinc finger peptide 
20 comprising natural zinc finger modules, linked with the canonical linker sequence 
-TGEKP- (SEQ ID NO:3), are provided. 

In a further embodiment, methods for the construction and use of poly-zinc finger 
peptides comprising at least one natural zinc finger module, containing either flexible or 
25 structured linkers (as described above and in WO 01/53480), are provided. 

b. Advantages of Natural Zinc Finger Modules 

Zinc finger modules are compact and stable structures of approximately 30 amino acids, 
30 which contain the full information required to bind a nucleic acid triplet or overlapping 

quadruplet. As such, they have proven to be extremely versatile scaffolds for engineering 
novel DNA-binding domains. See, for example, Rebar, E. J. & Pabo, C. O. (1994) 
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Science 263, 671-673; Jamieson, A. G, Kim, S.-H. & Wells, J. A. (1994) Biochemistry 
33, 5689-5695; Choo, Y. & Klug, A. (1994) Proc. Natl. Acad. Sci. U.S.A. 91, 1 1 163- 
11167; Choo, Y., Sanchez-Garcia, I. & Klug, A. (1994) Nature 372, 642-645; Wu, H., 
Yang, W.-P. & Barbas III, C. F. (1995) Proc. Natl. Acad. Sci. USA 92, 344-348; 
5 Greisman, H. A. & Pabo, C. O. (1997) Science 275, 657-661; Isalan, M., Klug, A. & 
Choo, Y. (1998) Biochemistry 37, 12026-12033; Choo, Y. (1998) Nature Struct. Biol. 5, 
264-265; Segal, D. J., Dreier, B., Beerli, R. R. & Barbas, C. F. (1999) Proc. Natl. Acad. 
ScLUSA 96, 2758-2763; Isalan, M. & Choo, Y. (2000) IMoJJBiol 295, 471-477; and 
Beerli, R. R., Dreier, B., Barbas, C.F. (2000) Proc Natl Acad Sci U S A 97, 1495-500. 
10 The resulting engineered zinc finger domains have increased our knowledge of sequence- 
specific DNA recognition, as well as provided a wide range of potential tools for 
medicine and biotechnology. 

As a result of these and other studies on zinc finger engineering, it has been recognised 
1 5 that an individual zinc finger module does not necessarily recognise a simple nucleotide 
triplet, as was first thought; but instead, can bind to an overlapping quadruplet of double 
stranded DNA. See, for example, Isalan et al. (1997) Proc Natl Acad Sci U S A 94, 5617- 
5621; and WO98/53057). In this respect, zinc finger engineering strategies have been 
particularly important for deciphering the mechanism and specificity of these interactions. 

20 

With the recent completion of the human genome project and the rapidly advancing fields 
of transgenic animals and plants, thousands of uncharacterised (and characterised) genes 
have (and will) become valid targets for functional genomics and other such projects. 
Concomitantly, engineered zinc finger peptides (often as a component of "designer" 

25 transcription factors) are emerging as one of the most universal and desirable ways of 
regulating the expression of specific genes within cells. See, for example, Choo, Y., 
Sanchez-Garcia, I. & Klug, A. (1994) Nature 372: 642-645; Beerli, R. R, Dreier, B. & 
Barbas, C. F. Ill (2000) Proc. Natl. Acad. Sci. USA 97: 1495-1500; Kim, J-S. & Pabo, C. 
O. (1998) Proc. Natl. Acad. Sci. USA 95: 2812-2817; Kang, J. S. & Kim, J-S. (2000) 

30 Biol. Chem. 275: 8742-8748; Zhang et al. (2000) J. Biol. Chem. 275:33,850-33,860; Liu 
etal. (2001) J. Biol. Chem. 276:11,323-11,334; Renetal. (2002) Genes. Devel.l6:27-32; 
and WO 00/41566. 
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Notwithstanding the remarkable progress in zinc finger engineering, there remain several 
issues that limit the use of engineered zinc fingers for such applications. Points of 
particular concern include the potential immunogenicity of non-natural zinc fingers, and 
5 the ' fine-tuning' of particular aspects of the protein-DNA interactions to obtain optimal 
and specific zinc finger-nucleic acid contacts. 

The present invention overcomes problems such as immunogenicity and optimal binding 
specificity, by exploiting the vast repertoire of naturally occurring zinc fingers to 
10 construct targeted zinc finger proteins having novel specificities. 

Imm un ogen icity 

The main function of the immune system is to detect, and render harmless, foreign 
15 particles which have invaded the body as a whole, or individual cells or organs. 'Foreign 5 
in this context means non-host, i.e. a substance which has originated from a different 
species, or one which has originated as a result of a mutation al event (such as might 
generate a malignant cell). On encountering such an antigenic particle, either in solution 
or on the surface of an infected cell, the body's defences rapidly destroy/remove it by 
20 complex pathways which involve the interaction of many members of the immune 

system. For a good overview of immunology see Roitt, Essential Immunology, Blackwell 
Science Ltd. and Roitt, L, Brostoff, J. & Male, D. Immunology, 4 Ed. Mosby. Hence, all 
biological therapeutic agents, such as peptides, nucleic acids, viruses, etc., risk eliciting 
an immune response in the recipient. Particularly for cases in which repeated doses of a 
25 therapeutic agent are required, this response can be strong and potentially dangerous to 
the host organism. 

The immune system functions through either innate or adaptive responses. The innate 
response is usually the body's first internal line of defence. Phagocytic cells recognise 
30 and bind to foreign objects in extracellular environments. Once bound, the foreign object 
is internalised and destroyed. Foreign therapeutic agents such as peptides and nucleic 
acids, which are administered directly to the blood stream of the recipient, risk being 
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detected and possibly destroyed before they even reach their intended target. This 
response is one of primitive non-specific recognition of non-host agents, and does not 
adapt with time or exposure to the antigen. 

5 Foreign therapeutic agents (or infectious agents such as bacteria and viruses), which 
evade the innate immune response and may have been successfully delivered to a 
particular cell have not necessarily avoided the host's immune system. Proteins that are 

■ 

expressed in cells are routinely degraded within lysosomes, and short peptide fragments, 
generally of between 6 and 9 amino acids, are transported to the cell surface and 

10 presented to the host's immune system. This is the start of the host's second internal 
defence mechanism against invasion, the adaptive immune response. The proteins 
responsible for displaying such peptide fragments are known as major-histocompatibility 
complexes (MHC) proteins. Lymphocyte cells, known as T-lymphocytes, dock with the 
MHC proteins and scan the peptide fragments displayed. Contact of a T-lymphocyte with 

15 a fragment specifically recognised as not belonging to the host organism initiates an 
immunological cascade which ultimately results in the host cell being destroyed or 
undergoing apoptosis. This mechanism is one of specific recognition, and once 
recognised as foreign, the antigen is 'remembered' so that any future invasions by the 
agent are dealt with more and more rapidly. B-cells are another type of lymphocyte that 

20 recognise extracellular particles and then produce and release antibodies to help combat 
the agent. 

To avoid potentially damaging the host organism and to ensure the successful delivery 
and action of a therapeutic peptide it is important to make it as much like a host protein as 
25 is reasonably possible. In the case of synthesised therapeutic antibodies for human use, a 
great deal of work has gone in to the 'humanisation' of antibodies produced by other 
animal species (See EP 0239400). In this invention we present a solution for the 
equivalent problem associated with zinc finger therapeutic peptides. 

30 To some extent, prior art zinc finger engineering strategies have attempted to minimise 

the risk of eliciting immune responses by using an engineering scaffold that is compatible 
with (i. e. that originates from) the recipient, and by limiting the sizes of the varied regions 
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within the final product. For example, typical engineered zinc fingers utilize a scaffold 
such as the three-finger DNA-binding domain of Zif268 (containing approximately 100 
amino acid residues). Because the amino acid sequence of Zif268 is completely 
conserved in a variety of species, including mice and humans, the scaffold is not itself 
5 immunogenic in these species. However, in order to engineer new DNA-binding 
domains, stretches of approximately 7 amino acids must be varied within each zinc 
finger. These sequences of 7 amino acids represent modifications in positions -1, 1, 2, 3, 
4, 5, and 6 of the a-helix of each finger. Although these engineered regions are 
considered to be relatively small, they are approximately the length of the peptide 
10 fragments displayed on the surface of cells by MHC molecules. Hence, they may provide 
antigenic peptide fragments in several registers of the amino acid sequence, which may 
result in dangerous and/or undesirable immune responses in the host. 

Accordingly, it is not known whether this type of engineering strategy will be entirely 
15 sufficient to avoid all potential undesirable effects, or indeed whether it will create the 
most optimal framework for all zinc finger-nucleic acid interactions. 

In addition to the zinc fingers themselves, it is also possible that inter- finger linker 
sequences could present potential immunological problems. Fortunately, natural zinc 

20 finger proteins display strong conservation and homology in their linker sequence. A 

very large number of natural fingers are joined by the canonical linker peptide -TGEKP- 
(SEQ ID NO: 3), located between the final zinc chelating residue (usually histidine) of the 
first finger, and the first residue of the second finger (usually a large hydrophobic residue 
such as tyrosine or phenylalanine, which begins the (3-sheet). Hence, the use of the 

25 canonical linker sequence -TGEKP- (SEQ ID NO: 3), to join natural zinc finger modules 
in a non-natural order, will reduce the possibility of eliciting an undesirable immune 
reaction to a minimum. Furthermore, there are so many natural zinc fingers which are 
already joined by canonical linker sequences, that if deemed necessary, the database of 
natural zinc fingers used for the construction of poly-zinc finger peptides may be 

30 restricted to those already flanked by such linkers. 
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The periodicity of zinc fingers and their amenability to linkage using the TGEKP (SEQ 
ID NO: 3) motif is illustrated in Table 2. 

OC-HELIX LINKER 
5 -1123456 

YA CPVESCDRRFS (SEQ ID NO: 24) RSDELTRHIRIH (SEQ ID NO: 25) TGEKP 
FQ CRI CMRNFS (SEQ ID NO: 2 6) RSDHLSTHIRTH (SEQ ID NO: 27) TGEKP 
FA CDI CGRKFA (SEQ ID NO: 28) RSDERKRHTKIH (SEQ ID NO: 29) TGEKP 

10 Table 2. A functional three-finger DNA-binding domain based on the peptide sequence 
of Zif268. TGEKP linker motifs are underlined. The helical residues of each zinc finger 
are numbered relative to the first helical position, position +1. Conserved Cysteines and 
Histidines forming the classical Cys2His2 zinc finger core are shown in bold. 

15 

Fine-Tuning of Zinc Finger-Nucleic Acid Interactions. 

It has previously been shown that zinc fingers cannot simply be regarded as independent 
nucleic acid-binding modules. Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistry 37, 

20 12026-12033; Isalan, M., Choo, Y. & Klug, A. (1997) Proc Natl Acad Sci 94 5 5617- 

5621. The interactions between adjacent zinc fingers can be complex and involve overlap 
of binding sites, which means that optimal interfaces are not easily engineered through 
rational design. Combinatorial library selection systems, which if designed correctly 
necessarily result in interface compatibility, can help to engineer better optimisation of 

25 the zinc finger-nucleic acid interface. See, for example, WO98/53057. However, all 
library selection systems suffer from the problem of library size, whereby because of 
physical constraints, it is impossible to include an exhaustive combination of 
randomisations to cover all potentially important sequence-space. For example, to 
optimise the zinc finger-nucleic acid interface, subtle amino acid variations may be 

30 needed, even from positions outside the recognition a-helix. Furthermore, alternative 
approaches to zinc finger engineering, such as 'affinity maturation 5 through random 
mutation or gene shuffling, which may (to a limited extent) increase the coverage of 
sequence space, may also raise the probability of generating undesirable immunological 
problems. Hence, it is possible that the creation of truly optimal zinc finger domains for 
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recognition of specific nucleic acid sequences may be outside the scope of traditional 
engineering strategies. 

In contrast, naturally occurring zinc finger modules have already been 'fine-tuned' by 
5 thousands of years of natural selection and are, under normal circumstances, non- 

immunogenic in their host organism. The human genome project has revealed that zinc 
finger-containing proteins constitute the second most abundant family of proteins in 
humans, with well over 600 members. Since zinc finger proteins usually contain several 
individual zinc finger modules, the human genome provides a repertoire of thousands of 

10 natural zinc finger modules for the creation of composite binding polypeptides. 

Furthermore, because there are only 64 (=4 3 ) possible 3 bp sequences and 256 Q=4 4 ) 
possible 4 bp sequences, it is likely that a natural zinc finger domain exists which is 
capable of binding to every potential 3- or 4-nucleotide target sequence. Consequently, 
natural zinc fingers are a very useful resource for the production of composite binding 

15 polypeptides comprising zinc fingers. At present, the natural binding site of many natural 
zinc finger modules is not known. Thus, to be useful for the construction of composite 
binding polypeptides, nucleotide sequence preferences for certain natural zinc fingers are 
determined according to rules tables disclosed in the following section ("Binding 
Specificity of Natural Zinc Finger Modules"). 

20 

To create optimal poly-zinc finger peptides the potentially significant problem of 
interface incompatibility must be addressed, since natural zinc finger modules will not 
necessarily be compatible with each other when juxtaposed. In this respect, a library 
construction and screening system is preferably employed which links natural zinc finger 

25 modules in non-natural combinations, and screens them against possible target sequences 
of greater than 3 or 4 bp in length (which represents the possible binding site of a single 
zinc finger module), to determine optimal 2- or 3-finger domains. In this way, the 
cooperative nature of zinc finger binding is taken into account in the design and selection 
of composite binding polypeptides, and in the determination of the sequence specificity of 

30 their binding. In one embodiment, a library of poly-zinc finger peptides containing at 

least one natural zinc finger module is provided. Preferably, poly-zinc finger peptides of 
the library contain at least two natural zinc finger modules. 
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5 c. Binding Specificity of Natural Zinc Finger Modules 

Disclosed herein are certain improvements to current limitations on the use of customised 
zinc finger nucleic acid binding domains, through the use of natural zinc finger modules. 
By using either natural 1 -finger or 2-finger sub-domains, and/or novel combinatorially- 
10 mixed, pre-selected 2-finger sub-domains, it is possible to construct poly-zinc finger 

peptides that bind any desired nucleotide target sequence, using non-natural combinations 
of natural zinc fingers. 

This approach is particularly suited for human gene therapy applications, but the 
1 5 invention is not just limited to zinc finger modules encoded by the human genome. For 
applications within transgenic animals such as mice, chicken, etc., the same system can 
be used, but incorporating natural zinc finger modules from those species instead (see 
Example 3). The genome of any organism (e.g., animal, plant, bacterium, virus, etc.) can 
thus provide a genetic 'toolbox' of non-immunogenic, structurally optimised zinc fingers 
20 for applications in that organism. 

Before such zinc finger modules can be utilised, however, it is essential that their optimal 
binding site is determined, in isolation, or preferably as part of a 2- or 3 -finger 
subdomain. Natural zinc finger modules are advantageously fused into subdomains 

25 comprising two or three zinc finger modules in random arrangement, optionally 

comprising an anchor finger, then subjected to binding site analysis. An 'anchor' zinc 
finger is one for which the binding specificity is known, such as, for example, finger 1 or 
finger 3 of Zif268, each of which binds the sequence 5'-GCG-3\ An anchor finger is 
attached to the N- or C-terminus of the zinc finger module(s) or subdomain for which the 

30 binding specificity is to be determined, and acts as an anchor to set the binding register 
for the binding site selection. For example, if the binding site preference of a pair of 
natural zinc fingers is to be determined, finger 1 of Zif268 may be fused to the N- 
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terminus of the pair of natural fingers, and a 5'-GCG~3' anchor sequence is placed at the 
3 5 end of 6 or more randomised nucleotides. Selection of the optimal binding site may 
thus be conducted with an oligonucleotide containing the sequence 5 5 -XXX-XXX-GCG- 
3 5 (SEQ ID NO: 30), where X is any specified nucleotide. The anchor sequence thereby 
5 allows the binding site preference of the zinc finger libraries to be easily determined. 
Such procedures are described in the Examples. 

Screening for Zinc Finger Binding Specificity 

10 There are various approaches, known to those in the art, for screening nucleic acid 

binding peptides for their binding specificity. To determine the binding specificity of, for 
example, zinc finger peptides, procedures can be conducted using: (a) a library of zinc 
fingers and a specified target sequence — to select one or more zinc finger peptides with a 
particular binding preference; or (b) a single zinc finger peptide and a random population 

15 of target sequences — to select one or more optimal binding sites for a particular peptide. 
For many applications, such as for the creation of transcription factors for regulating 
specific gene activity, it is often preferable to screen zinc finger libraries against specific 
target sequences. In this way, the search is geared towards a particular application. 
However, if the function or binding specificity of a natural protein is the object of the 

20 investigation, a library of potential binding sites can be screened useing a single peptide. 
Some such methods are outlined below. 

A typical method for screening libraries of nucleic acid binding polypeptides against 
specific target sites is that of phage display. Phage display protocols generally involve 

25 expressing the peptides under study as fusions with the gill major coat protein of 

bacteriophage (J. McCafferty, R. H. Jackson, D. J. Chiswell, (1991) Protein Engineering 
4, 955-961). Suitable protocols for the selection of zinc finger peptides have been 
described and are well known to those in the art. See, for example, Choo, Y. & Klug, A. 
(1994) Proc. Natl. Acad. Sci. U.S.A. 91, 11163-11167; Choo, Y., Sanchez-Garcia, I. & 

30 Klug, A. (1994) Nature 372, 642-645; Choo, Y. (1998) Nature Struct. Biol. 5, 264-265; 
Choo, Y. & Klug, A. (1997) Curr. Qpin. Str. Biol. 7, 117-125; 7 Isalan, M., Klug, A. & 
Choo, Y. (1998) Biochemistry 37, 12026-12033; Isalan, M. & Choo, Y. (2000) JMol 
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Biol 295, 471-477; Isalan, M., Choo, Y. & Klug, A. (1997) Proc Natl Acad Sci 94, 5617- 
5621; WO 01/53480, WO 01/53479, WO 96/06166, WO 98/53057, WO 98/53058, WO 
98/53059 and WO 98/53060 and references cited therein; see also Examples, infra. In 
general, sequences comprising target sites are bound, such as through biotin-streptavidin, 
5 to a solid support, such as a magnetic particle, or the surface of a tube or well. A solution 
of phage expressing members of a library of zinc finger peptides is then added to the 
immobilised target site. Non-bound phage are washed away and bound phage (containing 
the DNA encoding the bound zinc finger peptide), are collected. The collected phage 
sample is usually reused in further rounds of selection to enrich for the tightest binding 
1 0 zinc finger peptide. 

Phage display protocols based on random mutagenesis of zinc finger modules are known 
to have a number of limitations. First, as discussed above, the library size that can be 
expressed on the surface of phage is limited by the efficiency of procedures such as 
1 5 cloning and transformation. Furthermore, the efficiency of incorporation of gill-zinc 

finger fusions into phage and hence, zinc finger peptide expression, is determined by the 
number of zinc finger modules. Therefore, 2-finger peptides are expressed more 
efficiently than 3 -finger peptides and so on. For this reason, phage display protocols are 
generally limited to the assay of polypeptides comprising 3 or fewer zinc finger modules. 

20 

An alternative to phage display is an in vitro selection system. In such a system, libraries 
of zinc fingers can be produced by PCR using degenerate primer oligonucleotides. 
Target binding sites are added to the end of the DNA encoding the zinc finger peptide. 
Zinc finger peptide expression may be performed directly from PCR products using an in 

25 vitro expression kit, such as the TNT T7 Quick Coupled Transcription/Translation 

System for PCR DNA (Promega, Madison, WI, USA), or another suitable expression 
system. The components of the expression reaction (including the zinc finger 
gene/binding site) are compartmentalised by suspension in an emulsion, in such a way 
that (on average) only one copy of the zinc finger gene / binding site is present in each 

30 compartment. See, for example, Tawfik, D.S. & Griffiths, A.D. (1998) Nat. BiotechnoL 
16: 652-656. Zinc finger peptides which bind the specified target site (and the gene 
encoding them) can be collected using, for example, a suitable epitope tag (such as myc, 
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FLAG or HA tags), and the non-bound binding sites/zinc finger genes are removed. The 
genes encoding zinc finger peptides that bind the required target site can then be 
amplified by PCR and used in further rounds of selection if required. 

5 A preferred method for selecting a zinc finger peptide which binds a specified target 
sequence is described in Example 4. Briefly, the DNA encoding a library of zinc finger 
peptides with an attached epitope tag is diluted into as many aliquots as it is possible to 
screen (e.g. 384 or 1534 aliquots). This creates pools of sub-libraries with reduced 
numbers of variants. The DNA is then amplified by PCR and used to produce protein, 

1 0 from a suitable in vitro expression system, as described above. A specified binding site 
with an attached biotin molecule, and a horse radish peroxidase (HRP)-conjugated 
antibody to the peptide-attached epitope tag may then be added. Binding site / bound 
zinc finger / antibody complexes may be collected by binding to streptavidin and the 
samples are washed to remove unbound zinc finger and antibodies. The samples 

15 containing the highest amount of bound zinc finger peptide can be detected by adding an 
HRP substrate solution. The original DNA stock from such positive samples may then be 
diluted into aliquots (as above), PCR-amplified and used for the next round of selection. 
In this way, pools of zinc finger encoding genes with the desired activity are isolated, 
subdivided into pools of reduced variation and re-isolated until the most active clone is 

20 identified. 

Principal advantages of the in vitro systems described above are: (a) there is virtually no 
limit to the library size which can be screened (up to 10 12 different PCR products can 
easily be made); and (b) polypeptides comprising larger numbers of linked zinc finger 
25 modules (e.g., 4, 5, 6, 7, or more) can be assayed. Another in vitro selection system 
which can be used is polysome/ribosome display. See, for example, Mattheakis, L.C., 
Bhatt, R.R. & Dower, WJ. (1994) Proc. Natl Acad. Set USA. 91: 9022-9026; and WO 
00/27878. 

30 Protocols for the reverse selection procedure, i.e. the selection of a particular binding site 
from a mixed population using a single nucleic acid binding polypeptide, include SELEX 
(systematic evolution of ligands by exponential enrichment) and microarray techniques. 
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The SELEX procedure has been well described. See, for example, Drolet, D. W., Jenison, 
R.D., Smith, D.E., Pratt, D. & Hicke, B J. (1999) Comb. Chem. High Throughput Screen 
2: 271-278; Burden, D.A. & Osheroff, N. (1999) J. Biol. Chem. 274: 5227-5235; 
5 Shultzaberger, R.K. & Schneider, T.D. (1999) Nucleic Acids Res. 27: 882-887; Marozzi, 
A., Meneveri, R., Giacca, M., Gutierrez, M.L, Siccardi, A.G. & Ginelli, E. (1998) J. 
Biotechnol. 15: 117-128; and US Patents No. 5,270,163; 5,475,096; 5,595,877; 
5,670,637; 5,696,249; 5,817,785 and 6,331,398. A single nucleic acid binding 
polypeptide is expressed, either in vitro or in vivo, and screened against a library of target 
10 sequences. Nucleic acid binding polypeptides are collected (along with any bound target 
sites) using an epitope tag (as above) or another suitable procedure. Bound target sites 
are amplified by PGR and may be used in further rounds of selection, to enrich for the 
optimal binding site, or sequenced. 

1 5 Microarray technology provides a method of screening a particular polypeptide or nucleic 
acid against thousands to millions of target sequences on a single slid support such as, for 
example, a glass or nitrocellulose slide. For example, the members of a library encoding 
polypeptides comprising 2 linked zinc fingers will bind a 6 bp recognition sequence. 
Hence, there are 4096 (=4 6 ) unique binding sites for such a library. All 4096 of these 

20 sites can be arrayed onto a single glass slide, for example, allowing a specified 2-finger 
peptide to be screened simultaneously against every possible binding site. The amount of 
binding to each target sequence can be visualised and quantified using simple 
fluorescence measurements. For example, the zinc finger peptide may be expressed in 
vitro, or on the surface of phage. Isolated zinc finger peptides may contain an epitope tag 

25 for labelling purposes, whereas bound phage can be detected using a primary antibody 
against a phage coat protein, such as gVIII. A secondary antibody conjugated to, for 
example, R-phycoerythrin, horseradish peroxidase or alkaline phosphatase, can be used to 
provide a visible, quantifiable signal when a suitable substrate is applied. See, for 
example, Bulyk et al (2001) Proc. Natl Acad. Set USA:98,:13 9 7158-7163, which is 

30 incorporated, by reference, in its entirety. 
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Prediction of Binding Specificity 

The screening approaches described above rely on the assay of large libraries of 
randomly-selected natural zinc finger modules, to obtain one or more zinc finger modules 
5 that optimally bind a particular target nucleic acid sequence. In order to simplify the 
process further and ensure a more rapid selection of optimal zinc finger modules for a 
particular target site, sub-libraries can be created. In this disclosure, the term 'sub- 
library' refers to a library of natural zinc finger modules that have been roughly 
categorised according to their predicted binding specificity. For example, the total 

10 population of natural zinc fingers can be sub-divided to create libraries comprising zinc 
finger modules whose predicted binding sites are guanine (G) rich, cytosine (C) rich, 
adenine (A) rich or thymine (T) rich. Alternatively, sub-libraries can be categorised as 
binding G in the 3' position, in the central position, or in the 5' position of a nucleotide 
triplet, etc. Alternatively, sub-libraries can be created which comprise zinc finger 

1 5 modules predicted to bind a particular triplet sequence such as, for example, GGG, GGA, 
GGC, GGT, GAG, GCG, GTG, etc. This approach combines knowledge of the modes of 
zinc finger-nucleic acid recognition, gained from studies on artificial zinc finger variants, 
with the benefits of combinatorial library selection. It also takes into account the fact that 
concerted interactions between adjacent zinc fingers, i.e. overlapping contacts, can affect 

20 the binding affinity and/or specificity of individual zinc fingers. See, for example, 
Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistry 37, 12026-12033; Isalan, M., 
Choo, Y. & Klug, A. (1997) Proc Natl Acad Sci 94, 5617-5621. Thus, for example, a 
composite binding polypeptide comprising two fingers, each having a predicted binding 
specificity for a particular triplet, can be easily screened to determine if that pair of 

25 fingers are compatible with each other for binding to the 6-nucleotide target site 

comprising their individual target sequences. This strategy is described further in the 
Examples. 

For the process of creating sub-libraries of natural zinc fingers according to predicted 
30 binding preference, the rules set forth in international patent applications WO 96/06166, 
WO 98/53057, WO 98/53058, WO 98/53059 and WO 98/53060, and described in more 
detail below, are used. These rules allow the assignment of an amino acid residue, in an 
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appropriate position of the recognition region of a zinc finger module (generally 
comprising amino acids —1 through +6, with respect to the start of the alpha-helical 
portion of the finger), which will bind a specified nucleotide in a triplet or quadruplet 
target subsite. However, these rules can also be used to predict the sequence of a target 
5 subsite that would be preferentially bound by a zinc finger of given amino acid sequence. 
In particular, the identity of the amino acid residing at a particular position in the 
recognition region of a natural zinc finger module can be used to predict the identity of a 
nucleotide at a particular location in a target subsite. These 'rules' should be considered 
as a guide to target site preference and not a guaranteed prediction, as binding site 
10 specificity may be determined by variations elsewhere in the zinc finger module (i.e. 

outside of the recognition region), may be influenced by context, or may be influenced by 
factors as yet unknown. It should also be noted that some rules may be more generally 
applicable than others. 

15 In the application of these rules, it should be noted that the recognition region of a zinc 
finger aligns such that the N-terminal to C-terminal sequence of the finger is arranged 
along the nucleic acid strand to which it binds in a 3'-to-5' direction. As a result, when a 
zinc finger sequence and a nucleic acid sequence (to which the finger binds) are aligned, 
the primary interactions occur between the zinc finger and the 'minus' strand of the 

20 nucleic acid sequence (i.e. the strand which has a 3'-to-5' orientation). Furthermore, as 
stated above, the recognition region of a zinc finger comprises amino acids —1 through 
+6, with respect to the start of the alpha-helical portion of the finger. With respect to a 
particular zinc finger, an amino acid residue designated ++2 refers to the residue present 
in the adj acent (in the C-terminal direction) zinc finger, which (in certain instances) 

25 buttresses an amino acid-nucleotide interaction and/or participates in a cross-strand 
interaction with a nucleotide. 

Thus, the following set of rules can be used to predict a 3 bp target subsite for a given 
natural zinc finger module: (a) if the 5' base in the triplet is G, then position +6 in the ot~ 
30 helix is Arg; or position +6 is Ser or Thr and position ++2 is Asp; (b) if the 5' base in the 
triplet is A, then position +6 in the a-helix is Gin and -H-2 is not Asp; (c) if the 5' base in 
the triplet is T, then position +6 in the a-helix is Ser or Thr and position ++2 is Asp; (d) if 
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the 5' base in the triplet is C, then position +6 in the oc-helix may be any amino acid, 
provided that position ++2 in the a-helix is not Asp; (e) if the central base in the triplet is 
G, then position +3 in the a-helix is His; (f) if the central base in the triplet is A, then 
position +3 in the a-helix is Asn; (g) if the central base in the triplet is T, then position +3 
5 in the a-helix is Ala, Ser or Val; provided that if it is Ala, then one of the residues at -1 or 
+6 is a small residue; (h) if the central base in the triplet is C, then position +3 in the a- 
helix is Ser, Asp, Glu, Leu, Thr or Val; (i) if the 3' base in the triplet is G, then position - 
1 in the a-helix is Arg; (j) if the 3' base in the triplet is A, then position -1 in the a-helix 
is Gin; (k) if the 3' base in the triplet is T, then position -1 in the a-helix is Asn or Gin; (1) 
10 if the 3' base in the triplet is C, then position -1 in the a-helix is Asp. 

Furthermore, a natural zinc finger module may be capable of binding specifically to a 
four-nucleotide target subsite that overlaps with the target subsite of an adjacent zinc 
finger. In this case a different set of 'rules' can be used to determine predicted binding 
sites for each zinc finger module. Accordingly, in the description below, the overlapping 

15 4 bp binding site is described such that position 4 is the 5' base of a typical triplet binding 
site, position 3 is the central position of a typical triplet, position 2 is the 3 5 position of a 
typical triplet, and position 1 is the complement of the nucleotide which is contacted by 
the cross strand interaction from the +2 position of the zinc finger module. Position 1 can 
also be considered to be the 5' base of the triplet or quadruplet contacted by an adjacent 

20 (in the N-terminal direction) finger, if present. 

Binding to each base of a quadruplet by an a-helical zinc finger nucleic acid binding 
motif in a natural protein can be predicted with reference to the following rules: (a) if 
base 4 in the quadruplet is G, then position +6 in the a-helix is Arg or Lys; (b) if base 4 in 
the quadruplet is A, then position +6 in the a-helix is Glu, Asn or Val; (c) if base 4 in the 
25 quadruplet is T, then position +6 in the a-helix is Ser, Thr, Val or Lys; (d) if base 4 in the 
quadruplet is C, then position +6 in the a-helix is Ser, Thr, Val, Ala, Glu or Asn; (e) if 
base 3 in the quadruplet is G, then position +3 in the a-helix is His; (f) if base 3 in the 
quadruplet is A, then position +3 in the a-helix is Asn; (g) if base 3 in the quadruplet is T, 
then position +3 in the a-helix is Ala, Ser or Val; provided that if it is Ala, then one of the 
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residues at -1 or +6 is a small residue; (h) if base 3 in the quadruplet is C, then position 
+3 in the a-helix is Ser, Asp, Glu, Leu, Thr or Val; (i) if base 2 in the quadruplet is G, 
then position -1 in the a-helix is Arg; (j) if base 2 in the quadruplet is A, then position -1 
in the a-helix is Gin; (k) if base 2 in the quadruplet is T, then position -1 in the a-helix is 
5 His or Thr; (1) if base 2 in the quadruplet is C, then position -1 in the a-helix is Asp or 
His; (m) if base 1 in the quadruplet is G, then position +2 is Glu; (n) if base 1 in the 
quadruplet is A, then position +2 Arg or Gin; (o) if base 1 in the quadruplet is C, then 
position +2 is Asn, Gin, Arg, His or Lys; (p) if base 1 in the quadruplet is T, then position 
+2 is Ser or Thr. 

10 The above rules may be further refined to those described below: (a) if base 4 in the 

quadruplet is G, then position +6 in the a-helix is Arg; or position +6 is Ser or Thr and 
position ++2 is Asp; (b) if base 4 in the quadruplet is A, then position +6 in the a-helix is 
Gin and ++2 is not Asp; (c) if base 4 in the quadruplet is T, then position +6 in the a- 
helix is Ser or Thr and position ++2 is Asp; (d) if base 4 in the quadruplet is C, then 

15 position +6 in the a-helix may be any amino acid, provided that position -H-2 in the a- 

helix is not Asp; (e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 
(f) if base 3 in the quadruplet is A, then position +3 in the a-helix is Asn; (g) if base 3 in 
the quadruplet is T, then position +3 in the a-helix is Ala, Ser or Val; provided that if it is 
Ala, then one of the residues at -1 or +6 is a small residue; (h) if base 3 in the quadruplet 

20 is C, then position +3 in the a-helix is Ser, Asp, Glu, Leu, Thr or Val; (i) if base 2 in the 
quadruplet is G, then position -1 in the a-helix is Arg; (j) if base 2 in the quadruplet is A, 
then position -1 in the a-helix is Gin; (k) if base 2 in the quadruplet is T, then position -1 
in the a-helix is Asn or Gin; (1) if base 2 in the quadruplet is C, then position -1 in the a- 
helix is Asp; (m) if base 1 in the quadruplet is G, then position +2 is Asp; (n) if base 1 in 

25 the quadruplet is A, then position +2 is not Asp; (o) if base 1 in the quadruplet is C, then 
position +2 is not Asp; (p) if base 1 in the quadruplet is T, then position +2 is Ser or Thr. 

The rules therefore predict that the presence of an Asp (D) residue at position +2 will 
preclude binding to either A or C by an amino acid at position +6 in an adjacent N- 
30 terminal finger. Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistry 37, 12026-12033; 
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Isalan, M., Choo, Y. & Klug, A. (1997) Proc Natl Acad Sci 94, 5617-56212. Therefore, 
natural zinc fingers containing Asp, Glu, Asn or Gin at +6 are likely to be incompatible 
with any C-terminal finger containing an Asp residue at position +2. Although there are 
many such rules to describe the overlap between adjacent zinc fingers, a certain degree of 
5 degeneracy exists in these rules. Nonetheless, physical selection procedures (e.g., library 
construction and screening) can be used to extract optimal pairs of fingers for any given 
target subsite interface. 

Not all natural zinc fingers have a DNA-binding function. For example, it is known that 
10 many zinc fingers, such as those from TFIIIA, bind to RNA (Clemens, K. R. et ah, (1993) 
Science 260: 530-533; Bogenhagen, D.F. (1993) Mol Cell Biol. 13: 5149-5158; Searles, 
M. A. et ah, J- Mol. Biol. 301 : 47-60 (2000)). The rules governing RNA binding by zinc 
fingers are less well understood than those of DNA binding, but some RNA binding zinc 
fingers can be identified on the basis of a characteristic sequence motif. Clemens, K. R. 
15 et ah, (1993) Science 260: 530-533; Bogenhagen, D.F. (1993) Mol Cell Biol 13: 5149- 
5158; Searles, M. A. et ah (2000) J. Mol. Bioh 301: 47-60. Furthermore, some zinc 
fingers, such as those from the protein Ikaros, are able to form protein-protein 
interactions. Such zinc fingers often contain large hydrophobic patches. Mackay, J. P. & 
Crossley, M. (1998) Trends Biochem. Sci. 23: 1-4. 

20 

To this end, applied bioinformatic processing can help to determine which candidates in a 
particular genome are best suited to fulfilling a particular function, such as DNA-binding. 
In the case of zinc fingers, numerous documented databases exist denoting amino acid 
residues that are most likely to be found at particular positions within a DNA-binding 

25 zinc finger. See, for example, Isalan, M., Klug, A. & Choo, Y. (1998) Biochemistry 37, 
12026-12033; Choo, Y. & Klug, A. (1997) Curr. Opin. Str. Biol. 7, 117-125; WO 
98/53060; WO 98/53059; WO 98/53058. As an example, disclosed herein is a database 
of approximately 200 natural human zinc fingers which have been selected (on the basis 
of coded contacts) as having potentially useful DNA-binding activity (see Example 1). 

30 Also disclosed in Example 1 are the predicted DNA target sequences of these zinc 
fingers, assigned according to the rules set out above. 
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As the human genome contains almost 700 zinc finger-containing proteins, there are 
many other candidates that can be included in a more inclusive library of natural zinc 
fingers. A selection of these are disclosed in Example 2. 

5 Similar work can be carried out in other organisms, such as farm (cows, pigs, sheep, 
chickens, etc.), laboratory (monkeys, rats, mice, etc.) and domestic (dogs, cats, etc.) 
animals. In this case, it is necessary to select natural zinc finger modules from the 
respective genomes of such organisms. Examples of zinc finger modules which have 
been selected from mouse, chicken and certain plant genomes, are disclosed in 
10 Example 3. 

d. Zinc Finger Chimeric Peptides 

In a preferred embodiment, the composite binding polypeptides described herein 
comprise chimeric nucleic acid binding polypeptides. 

15 A chimeric nucleic acid binding polypeptide, also referred to as a fusion 

polypeptide, comprises a binding domain (comprising a number of nucleic acid binding 
polypeptide modules or fingers) designed to bind specifically to a target nucleotide 
sequence, together with one or more further biological effector domains or functional 
domains. The terms "biological effector domain" and "functional domain" refer to any 

20 polypeptide (of functional fragment thereof) that has a biological function. Included are 
enzymes, receptors, regulatory domains, transcriptional activation or repression domains, 
binding sequences, dimerisation, trimerisation or multimerisation sequences, sequences 
involved in protein transport, localisation sequences such as subcellular localisation 
sequences, nuclear localisation, protein targeting or signal sequences. Furthermore, 

25 biological effector domains may comprise polypeptides involved in chromatin 

remodelling, chromatin condensation or decondensation, DNA replication, transcription, 
translation, protein synthesis, etc. Fragments of such polypeptides comprising the 
relevant activity (i. e. , functional fragments) are also included in this definition. Preferred 
biological effector domains include transcriptional modulation domains such as 
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transcriptional activators and transcriptional repressors, as well as their functional 
fragments. 

The effector domain(s) can be covalently or non-covalently attached to the 
binding domain. 

5 Chimeric nucleic acid binding polypeptides preferably comprise transcription 

factor activity, for example, a transcriptional modulation activity such as transcriptional 
activation or transcriptional repression activity. For example, a zinc finger chimeric 
polypeptide may comprise a binding domain designed to bind specifically to a particular 
nucleotide sequence, and one or more further biological effector domains, preferably a 
10 transcriptional activation or repression domain, as described in further detail below. The 
zinc finger chimeric polypeptide may comprise one or more zinc fingers or zinc finger 
binding modules. 

Preferably, in the case of a chimeric polypeptide comprising transcriptional 
modulation activity, a nuclear localisation domain is attached to the DNA binding domain 
15 to direct the chimeric polypeptide to the nucleus. 

Generally, a chimeric nucleic acid binding polypeptide, such as a chimeric zinc 
finger polypeptide, can also include an effector domain to regulate gene expression. The 
effector domain can be directly derived from a basal or regulated transcription factor such 
as, for example, transactivators, repressors, and proteins that bind to insulator or silencer 

20 sequences. See, for example, Choo & Klug (1995) Curr. Opin. Biotech, 6: 431-436; 
Choo, Y. & Klug, A. (1997) Curr. Opin. Str. Biol. 7, 1 17-125; Rebar & Pabo (1994) 
Science 263: 671-673; Jamieson et al. (1994) Biochem. 33: 5689-5695; Goodrich et al 
(1996) Cell 84: 825-830; Vostrov, A. A. & Quitschke, W. W. (1997) J. Biol Chem. 272: 
33353-33359 and WO 00/41566 and references disclosed therein. Other useful domains 

25 are derived from receptors such as, for example, nuclear hormone receptors (Kumar, R & 
Thompson, E. B. (1999) Steroids 64: 310-319 ), and their co-activators and co-repressors 
(Ugai, H. et al (1999) J. Mol Med. 77: 481-494). 
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A chimeric nucleic acid binding polypeptide can also include other domains that 
may be advantageous within the context of the control of gene expression. Such domains 
include, but are not limited to, protein-modifying domains such as histone 
acetyltransferases, kinases, methylases and phosphatases, which can silence or activate 
5 genes by modifying DNA structure or the proteins that associate with nucleic acids. See, 
for example, Wolffe, Science 272: 371-372 (1996); Taunton et al, Science 272: 408-41 1 
(1996); Hassig et al, Proc. Natl Acad. Set USA 95: 3519-3524 (1998); Wang, Trends 
Biochem. Sci. 19: 373-376 (1994); and Schonthal & Semin, Cancer Biol. 6: 239-248 
(1995). Additional useful effector domains include those that modify or rearrange nucleic 

10 acid molecules such as methyltransferases, endonucleases, ligases, recombinases etc. 

See, for example, Wood, Ann. Rev. Biochem. 65: 135-167 (1996); Sadowski, FASEB J. 
7: 760-767 (1993); Cheng, Curr. Opin. Struct. Biol. 5: 4-10 (1995); Wu et al. (1995) 
Proc. Natl. Acad. Sci. USA 92:344-348; Nahon & Raveh, Nucleic Acids Res 1998 Mar 
l;26(5):1233-9; Smith et al. Nucleic Acids Res. 1999 Jan 15;27(2):674-81; and Smith et 

15 al (2000) Nucleic Acids Res. Sept 1; 28(17):3361-9. It will be appreciated that the 

biological effector domain portion of the chimeric polypeptide may itself also comprise 
such activities, without the need for further additional domains. 

For the purpose of gene activation, zinc finger domains may be fused to the VP64 
domain. See, for example, Seipel et al, EMBO J. 1 1 : 4961-4968 (1996). Other preferred 

20 transactivator domains include the herpes simplex virus (HSV) VP 16 domain (Hagmann 
etal. (1997) J. Virol 71: 5952-5962; Sadowski et al (1988) Nature 335:563-564), 
transactivation domain 1 and/or domain 2 of the p65 subunit of nuclear factor-KB (NF- 
kB (Schmitz, M. L. et al (1995) J. Biol Chem. 270: 15576-15584 ). Other transcription 
factors are reviewed in, for example, Lekstrom-Himes J. & Xanthopoulos K. G. (C/EBP 

25 family) J. Biol Chem. 273: 28545-28548 (1998); Bieker, J. J. et al, (globin gene 
transcription factors) Ann. N. Y. Acad. Sci. 850: 64-69 (1998), and Parker, M. G. 
(estrogen receptors) Biochem. Soc. Symp. 63: 45-50 (1998). 



Use of a transactivation domain from the estrogen receptor is disclosed in 
Metivier ? R., Petit, FG., Valotaire, Y. & Pakdel, F. (2000) Mol. Endocrinol 14: 1849- 
30 1871. Furthermore, activation domains from the globin transcription factors EKLF 
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(Pandya, K. Donze, D. & Townes T. (2001) J. Biol Chem. 276: 8239-8243) may also be 
used, as well as a trans activation domain from FKLF (Asano, H. Li, XS.& 
Stamatoyannopoulos, G. (1999) Mol Cell Biol 19: 3571-3579). C/EPB transactivation 
domains may also be employed in the methods described herein. The C/EBP epsilon 
5 activation domain is disclosed in Verbeek, W., Gombart, AF, Chumakov, AM, Muller, C, 
Friedman, AD, & Koeffler, HP (1999) Blood 15: 3327-3337. Kowenz-Leutz, E. & Leutz, 
A. (1999) Mol Cell 4: 735-743 disclose the use of the C/EBP tau activation domain, 
while the C/EBP alpha transactivation domain is disclosed in Tao, H., & Umek, RM. 
(1999) DNA Cell Biol 18: 75-84. 



10 It is known that zinc finger proteins may be fused to transcriptional repression 

domains such as the Kruppel-associated box (KRAB) domain to form powerful 
repressors. These domains are known to repress expression of a reporter gene even when 
bound to sites a few kilobase pairs upstream from the promoter of the gene (Margolin et 
al, 1994, Proc. Natl Acad. Set USA 91: 4509-4513). Hence, in certain embodiments, 

15 the KRAB repressor domain from the human KOX-1 protein is used to repress gene 
activity (Moo sm aim et al, Biol Chem. 378: 669-677 (1997); Thiesen et al, New 
Biologist 2: 363-374 (1990)). In additional embodiments, larger fragments of the KOX-1 
protein comprising the KRAB domain, up to and including full-length KOX protein, are 
used as transcriptional repression domains. See, for example, Abrink et ah (2001) Proc. 

20 Natl Acad. Sci. USA 98:1422-1426. Other preferred transcriptional repressor domains 

are known in the art and include, for example, the engrailed domain (Han et al, EMBO J. 
12: 2723-2733 (1993)), the snag domain (Grimes et al, Mol Cell Biol 16: 6263-6272 
(1996)) and the transcriptional repression domain of v-erbA (e.g., Urnov et al (2000) 
EMBO J. 19:4074-4090; Sap et al (1989) Nature 340:242-244 and Ciana et al (1999) 

25 EMBO J. 17:7382-7394). 

Biological effector domains can be covalently or non-covalently linked to a 
binding domain. In one embodiment, a covalent linker comprises a flexible amino acid 
sequence; fusion polypeptides according to this embodiment comprise a nucleic acid 
binding domain fused, by an amino acid linker, to a biological effector domain. 
30 Alternatively, a covalent linker may comprise a synthetic, non-amino acid based, 
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chemical linker, for example, polyethylene glycol. Synthetic linkers are commercially 
available, and methods of chemical conjugation are known in the art. Covalent linkers 
may comprise flexible or structured linkers, as described above. 

Non-covalent linkages between a nucleic acid binding domain and an effector 
5 domain can be formed using, for example, leucine zipper/coiled coil domains, or other 
naturally occurring or synthetic dimerisation domains. See e.g., Luscher, B. & Larsson, 
L. G. Oncogene 18:2955-2966 (1999) and Gouldson, P. R. et al 9 
Neuropsychopharmacology 23: S60-S77 (2000). 



The expression of composite binding polypeptides (for example, zinc finger 

10 polypeptides) can be controlled by tissue specific promoter sequences such as, for 

example, the Ick promoter (thymocytes, Gu, H. et al. 9 Science 265: 103-106 (1994)); the 
human CD2 promoter (T-cells and thymocytes, Zhumabekov, T. et aL, J. Immunological 
Methods 185: 133-140 (1995)); the alpha A-crystallin promoter (eye lens, Lakso, M. et 
al. 9 Proc. Natl. Acad. Sci. 89: 6232-6236 (1992)); the alpha-calcium-calmodulin- 

15 dependent kinase II promoter (hippocampus and neocortex, Tsien, J. et a/., Cell 87: 1327- 
1338 (1996)); the whey acidic protein promoter (mammary gland, Wagner, K.-U. et aL, 
Nucleic Acids Res. 25: 4323-4330 (1997)); the aP2 enhancer/promoter (adipose tissue, 
Barlow C. et al., Nucleic Acids Res. 25: 2543-2545 (1997)); the aquaporin-2 promoter 
(renal collecting duct, Nelson R. et aL, Am. J. Physiol. 275: C216-C226 (1998)); and the 

20 mouse myogenin promoter (skeletal muscle, Grieshammer, U. et al., Dev. Biol. 197: 234- 
247 (1998)). The expression of such polypeptides can also be controlled by inducible 
systems, in particular, controlled by small molecule induction such as the tetracycline- 
controlled systems (tet-on and tet-off), the RU-486 or tamoxifen hormone analogue 
systems, or the radiation-inducible early growth response gene-1 (EGR1) promoter. 

25 These promoter constructs and inducible systems have the benefit of being able to 

provide organ-specific and/or inducible expression of target genes for use in applications 
such as gene therapy and transgenic animals. 
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e. Vectors 



The nucleic acid encoding the nucleic acid binding polypeptide such as a zinc 
finger polypeptide can be incorporated into intermediate vectors and transformed into 
prokaryotic or eukaryotic cells for expression or DNA amplification. 

5 As used herein, vector (or plasmid) preferably refers to discrete elements that are 

used to introduce heterologous nucleic acid into cells for either expression or replication 
thereof. The term "heterologous to the cell" means that the sequence does not naturally 
exist in the genome of the host cell but has been introduced into the cell. The term 
"introduced into" means that a procedure is performed on a cell, tissue, organ or organism 

10 such that the gene encoding the nucleic acid binding polypeptide (for example, a zinc 
finger polypeptide) previously absent from the cell or cells is then present in the cell or 
cells. Alternatively, or in addition, the gene may be initially present in the cell or cells 
and subsequently altered by introduction of heterologous DNA. A heterologous sequence 
may include a modified sequence introduced at any chromosomal site, or which is not 

15 integrated into a chromosome, or which is introduced by homologous recombination such 

■ 

that it is present in the genome in the same position as the native allele. Selection and use 
of such vectors are well within the skill of the person of ordinary skill in the art. Many 
vectors are available, and selection of an appropriate vector will depend on the intended 
use of the vector, i.e. whether it is to be used for DNA amplification or for nucleic acid 

20 expression, the size of the DNA to be inserted into the vector, and the host cell to be 

transformed with the vector, etc. Another consideration is whether the vector is to remain 
episomal or integrate into the host genome. Suitable vectors may be of bacterial, viral, 
insect or mammalian origin. Intermediate vectors for storage or manipulation of the 
nucleic acid encoding the nucleic acid binding polypeptide, or for expression and 

25 purification of the polypeptide are typically of prokaryotic origin. Most expression 
vectors are shuttle vectors, i.e. they are capable of replication in at least one class of 
organisms but can be transfected into another class of organisms for expression. For 
example, a vector is cloned in E. coli and then the same vector is transfected into yeast or 
mammalian cells even though it is not capable of replicating independently of the host 

30 cell chromosome. DNA may also be replicated by insertion into the host genome. The 
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nucleic acid binding polypeptides such as zinc finger polypeptides described here are 
preferably inserted into a vector suitable for expression in mammalian cells. 

Prokaryote, yeast and higher eukaryote cells may be used for replicating DNA and 
producing the nucleic acid binding protein. Suitable prokaryotes include eubacteria, such 
5 as Gram-negative or Gram-positive organisms, such as E. coli, e.g. E. coli K-12 strains, 
DH5a and HB101, or Bacilli. Further hosts suitable for the vectors include eukaryotic 
microbes such as filamentous fungi or yeast, e.g. Saccharomyces cerevisiae. Higher 
eukaryotic cells include insect and vertebrate cells, particularly mammalian cells 
including human cells or nucleated cells from other multicellular organisms. In recent 
10 years propagation of vertebrate cells in culture (tissue culture) has become a routine 

procedure. Examples of useful ( mammalian host cell lines are epithelial or fibroblastic cell 
lines such as Chinese hamster ovary (CHO) cells, NIH 3T3 cells, HeLa cells or 293T 
cells. The host cells referred to in this disclosure comprise cells in in vitro culture as well 
as cells that are within a host animal. 

1 5 Each vector contains various components depending on its function (amplification 

of DNA or expression of DNA) and the host cell for which it is compatible. The vector 
components generally include, but are not limited to, one or more of the following: an 
origin of replication, one or more selectable marker genes, a promoter, an enhancer 
element, a transcription termination sequence and a signal sequence. 

20 Both expression and cloning vectors generally contain nucleic acid sequence that 

enable the vector to replicate in one or more selected host cells. Typically in cloning 
vectors, this sequence is one that enables the vector to replicate independently of the host 
chromosomal DNA, and includes origins of replication or autonomously replicating 
sequences. Such sequences are well known for a variety of bacteria, yeast and viruses. 

25 The origin of replication from the plasmid pBR322 is suitable for most Gram-negative 
bacteria, the 2\i plasmid origin is suitable for yeast, and various viral origins (e.g. SV 40, 
polyoma, adenovirus) are useful for cloning vectors in mammalian cells. Generally, the 
origin of replication component is not needed for mammalian expression vectors unless 
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these are used in mammalian cells competent for high level DNA replication, such as 
COS cells. 

Advantageously, an expression and cloning vector contains a selection gene also 
referred to as selectable marker. This gene encodes a protein necessary for the survival or 
5 growth of transformed host cells grown in a selective culture medium. Host cells not 
transformed with the vector containing the selection gene will not survive in the culture 
medium. Typical selection genes encode proteins that confer resistance to antibiotics and 
other toxins, e.g. ampicillin, neomycin, methotrexate or tetracycline, complement 
auxotrophic deficiencies, or supply critical nutrients not available from complex media. 

10 

Since the replication of vectors is conveniently done in E. coli, an E. coli genetic 
marker and an E. coli origin of replication are advantageously included. These can be 
obtained from E. coli plasmids, such as pBR322, Bluescript© vector or a pUC plasmid, 
e.g. pUC18 or pUC19, which contain both E. coli replication origin and E. coli genetic 
1 5 marker conferring resistance to antibiotics, such as ampicillin and tetracycline. Vectors 
such as these are commercially available. 

As to a selective gene marker appropriate for yeast, any marker gene can be used 
which facilitates the selection for transformants due to the phenotypic expression of the 
20 marker gene. Suitable markers for yeast are, for example, those conferring resistance to 
antibiotics G418, hygromycin or bleomycin, or provide for prototrophy in an auxotrophic 
yeast mutant, for example the URA3, LEU2, LYS2, TRP1, or HIS3 gene. 

Suitable selectable markers for mammalian cells are those that enable the 
25 identification of cells competent to take up nucleic acid, such as dihydrofolate reductase 
(DHFR, methotrexate resistance), thymidine kinase, or genes conferring resistance to 
neomycin, G41 8 or hygromycin. The mammalian cell transformants are placed under 
selection pressure which only those transformants which have taken up and are 
expressing the marker are uniquely adapted to survive. In the case of a DHFR or 
30 glutamine synthase (GS) marker, selection pressure can be imposed by culturing the 

transformants under conditions in which the pressure is progressively increased, thereby 
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leading to amplification (at its chromosomal integration site) of both the selection gene 
and the linked DNA that encodes the nucleic acid binding protein. Amplification is the 
process by which genes in greater demand (such as one encoding a protein that is critical 
for growth), together with closely associated genes (such as one encoding a composite 
5 binding polypeptide), are reiterated in tandem within the chromosomes of recombinant 
cells. Increased quantities of desired protein are usually synthesised from this amplified 
DNA. 

Expression and cloning vectors usually contain control sequences that are 
recognised by the host organism and are operably linked to the nucleic acid encoding a 

10 nucleic acid binding polypeptide. The term "control sequences" is intended to include, at 
a minimum, components whose presence can influence expression, and can also include 
additional components whose presence is advantageous, for example, leader sequences 
and fusion partner sequences. The term "operably linked" means that the components 
described are in a relationship permitting them to function in their intended manner. 

1 5 Typical control sequences include promoters, enhancers and other expression regulation 
signals such as terminators. Such a promoter may be inducible or constitutive. A 
regulatory sequence operably linked to a coding sequence is ligated in such a way that 
expression of the coding sequence is achieved under conditions compatible with the 
control sequences. 

20 The term promoter is well known in the art and encompasses nucleic acid regions 

ranging in size and complexity from minimal promoters to promoters including upstream 
elements and enhancers. Suitable promoters for use in prokaryotic and eukaryotic cells 
are well known in the art, and described in for example, Current Protocols in Molecular 
Biology (Ausubel et ah, eds., 1994) and Molecular Cloning. A Laboratory Manual 

25 (Sambrook et aU 2 nd ed. 1989). 

Promoters suitable for use with prokaryotic hosts include, for example, the |3- 
lactamase and lactose promoter systems, alkaline phosphatase, the tryptophan (Trp) 
promoter system and hybrid promoters such as the tac promoter. Their nucleotide 
sequences have been published, thereby enabling the skilled worker to ligate them to 
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DNA encoding a composite binding protein, using linkers or adapters to supply any 
required restriction sites. Promoters for use in bacterial systems will also generally 
contain an adjacent ribosome binding site (e.g., a Shine-Dalgarno sequence) operably 
linked to the DNA encoding the composite binding polypeptide. 

5 Preferred expression vectors are bacterial expression vectors, which comprise a 

promoter of a bacteriophage such as phage lambda, SP6, T3 or T7, for example, which is 
capable of functioning in bacteria. In one of the most widely used expression systems, 
the nucleic acid encoding the fusion protein can be transcribed from a vector by T7 RNA 
polymerase (Studier et al, Methods in Enzymol. 1 85: 60-89, 1990). In the E. coli 

1 0 BL21 (DE3) host strain, used in conjunction with pET vectors, the T7 RNA polymerase is 
produced from the X-lysogen DE3 in the host bacterium, and its expression is under the 
control of the IPTG inducible lac UV5 promoter. This system has been employed 
successfully for over-production of many proteins. Alternatively, the polymerase gene 
may be introduced on a lambda phage by infection with an int" phage such as the CE6 

15 phage, which is commercially available (Novagen, Madison, WI, USA). Other vectors 
include vectors containing the lambda Pl promoter such as PLEX (Invitrogen, NL), 
vectors containing the trc promoters such as pTrcHisXpressTm (Invitrogen), or pTrc99 
(Pharmacia Biotech, SE), or vectors containing the tac promoter such as pKK223-3 
(Pharmacia Biotech), or PMAL (New England Biolabs, Beverly, MA, USA). A suitable 

20 vector for expression of proteins in mammalian cells is the CMV enhancer-based vector 
such as pEVRF (Matthias, et al, (1989) Nucleic Acids Res. 17, 6418). 

Suitable promoting sequences for use with yeast hosts may be regulated or 
constitutive and are preferably derived from a highly expressed yeast gene, especially a 
Saccharomyces cerevisiae gene. Thus, the promoter of the TRP1 gene, the ADHI or 

25 ADHII gene, the acid phosphatase (PH05) gene, a promoter of the yeast mating 
pheromone genes coding for the a- or a-factor or a promoter derived from a gene 
encoding a glycolytic enzyme such as the promoter of the enolase, glyceraldehyde-3- 
phosphate dehydrogenase (GAP), 3-phosphoglycerate kinase (PGK), hexokinase, 
pyruvate decarboxylase, phosphofructokinase, glucose-6-phosphate isomerase, 3- 

30 phosphoglycerate mutase, pyruvate kinase, triose phosphate isomerase, phospho glucose 
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isomerase or glucokinase genes, or a promoter from the TATA binding protein (TBP) 
gene can be used. Furthermore, it is possible to use hybrid promoters comprising 
upstream activation sequences (UAS) of one yeast gene and downstream promoter 
elements including a functional TATA box of another yeast gene, for example a hybrid 
5 promoter including the UAS(s) of the yeast PH05 gene and downstream promoter 

elements including a functional TATA box of the yeast GAP gene (PH05-GAP hybrid 
promoter). A suitable constitutive PHOS promoter is, for example, a shortened acid 
phosphatase PH05 promoter devoid of the upstream regulatory elements (UAS) such as 
the PHOS (-173) promoter element starting at nucleotide -173 and ending at nucleotide -9 
10 of the PHOS gene. 

The promoter is typically selected from promoters which are found in animal 
cells, although prokaryotic promoters and promoters functional in other eukaryotic cells 
can be used. Typically, the promoter is derived from viral or animal gene sequences, may 
be constitutive or inducible, and may be strong or weak. 

15 Viral promoters can be derived from viruses such as polyoma virus, adenoviruses, 

adeno-associated viruses, poxviruses (e.g., fowlpox virus), papilloma viruses (e.g., BPV), 
avian sarcoma virus, cytomegalovirus (CMV), herpesviruses, retroviruses, lentiviruses 
and simian virus 40 (SV40). An example of a relatively weak viral promoter is thymidine 
kinase promoter from herpes simplex virus (HS V-TK). 

20 Mammalian derived promoters can be heterologous to the animal in which 

composite binding polypeptide (such as zinc finger polypeptide) expression is to occur, or 
they can be host sequences. In some applications it is preferable to use a promoter that is 
active in all cell types, however it is often preferable to use promoter sequences that are 
active in specific cell types only. 

25 The actin promoter and the strong ribosomal protein promoter are examples of 

promoter sequences that are active in all cell types. In contrast, by using promoters that 
are specific for certain cell or tissue types, the gene encoding the nucleic acid binding 
polypeptide can be expressed only in the required cell or tissue types. This may be of 
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extreme importance for applications such as gene therapy, and for the production of 
viable transgenic animals. Such promoters are known in the art and include the Ick 
promoter (thymocytes, Gu, H. et ah, Science 265: 103-106 (1994)), the human CD2 
promoter (T-cells and thymocytes, Zhumabekov, T. et al. 9 J. Immunological Methods 
5 185: 133-140 (1995)); the alpha A-crystallin promoter (eye lens, Lakso, M. et al. 9 Proc. 
Natl Acad. Sci. 89: 6232-6236 (1992)), the alpha-calcium-calmodulin-dependent kinase 
II promoter (hippocampus and neocortex, Tsien, J. et al. 9 Cell 87: 1327-1338 (1996)), the 
whey acidic protein promoter (mammary gland, Wagner, K.-U. et aL, Nucleic Acids Res. 
25: 4323-4330 (1997)), the aP2 enhancer/promoter (adipose tissue, Barlow C. et aL, 
10 Nucleic Acids Res. 25: 2543-2545 (1997)), the aquaporin-2 promoter (renal collecting 
duct, Nelson R. et aL, Am. J. Physiol. 275: C216-C226 (1998)), the mouse myogenin 
promoter (skeletal muscle, Grieshammer, U. et aL 9 Dev. Biol. 197: 234-247 (1998)), 
retinoblastoma gene promoter (nervous system, Jiang, Z. et aL, J. Biol. Chem. 276: 593- 
600 (2001)). 

15 The expression of nucleic acid binding polypeptides such as zinc finger 

polypeptides can also be controlled by small molecule induction or other inducible 
systems such as the tetracycline inducible systems (tet-on and tet-off), the RU-486 or 
tamoxifen hormone analogue systems, or the radiation-inducible early growth response 
gene-1 (EGR1) promoter, all of which are commercially available. By using such 

20 inducible promoter systems, transgenic lines can be established which carry a zinc finger 
chimeric polypeptide but express it only after addition of an inducer molecule. Thus the 
genes encoding the zinc finger polypeptides or other nucleic acid binding polypeptides 
can be expressed (or not expressed) in response to the small molecule, which can be 
easily administered. These systems may also allow the time and amount of polypeptide 

25 expression to be regulated. 

Expression vectors typically contain expression cassettes that carry all the 
additional elements required for efficient expression of the nucleic acid in the host cell. 
Additional elements are enhancer sequences, polyadenylation and transcriptional 
termination signals, ribosome binding sites, and translational termination sequences. 
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Transcription of DNA by higher eukaryotes may be increased by inserting an 
enhancer sequence into the vector. Enhancers are relatively orientation and position 
independent. Many enhancer sequences are known from mammalian genes (e.g. elastase 
and globin). However, typically one will employ an enhancer from a eukaiyotic cell 
5 virus. Examples include the SV40 enhancer on the late side of the replication origin 
(approx. bp 100-270) and the CMV early promoter enhancer. The enhancer may be 
spliced into the vector at a position 5' or 3 f to the gene encoding the zinc finger 
polypeptide or nucleic acid binding polypeptide, but is preferably located at a site 5 ? from 
the promoter. 

10 It has also been shown that the expression of a heterologous gene in an animal cell 

may be enhanced by retaining intron sequences (as opposed to using a cDNA clone). For 
example, intron 1 of the human CD2 gene has been shown to enhance the level of 
expression of CD2 in human cells (Festenstein, R. et al 1996 Science 271: 1 123). 

Advantageously, a eukaryotic expression vector encoding a nucleic acid binding 
15 protein may comprise a locus control region (LCR). LCRs are capable of directing high- 
level integration site-independent expression of transgenes integrated into host cell 
chromatin. This is particularly important where the gene encoding the zinc finger 
polypeptide or the nucleic acid binding polypeptide is to be expressed over extended 
periods of time, for applications such as transgenic animals and gene therapy, as gene 
20 silencing of integrated heterologous DNA - especially of viral origin — is known to occur 
(Palmer, T. D. et al, Proc. Natl. Acad. Set. USA 88: 1330-1334 (1991); Harpers, K. et al, 
Nature 293: 540-542 (1981); Jahner, D. et al, Nature 298: 623-628 (1992); and Chen, W. 
Y. et al, Proc. Natl Acad. Set USA 94: 5798-5803 (1997)). Typical LCRs are 
exemplified by the human (3-globin cluster, and the HS-40 regulatory region from the oc- 
25 globin locus. 

Eukaryotic vectors may also contain sequences necessary for the termination of 
transcription and for stabilising the mRNA transcript. Such sequences are commonly 
available from the 5 f and 3 ! untranslated regions of eukaryotic or viral DNAs, and are 
known in the art. These regions contain nucleotide segments transcribed as 
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polyadenylated fragments in the untranslated portion of the mRNA encoding the relevant 
polypeptide. An appropriate terminator of transcription is fused downstream of the gene 
encoding the selected nucleic acid binding polypeptide such as a zinc finger protein. Any 
of a number of known transcriptional terminator, RNA polymerase pause sites and 
5 polyadenylation enhancing sequences can be used at the 3 5 end of the nucleic acid 

encoding for example a zinc finger polypeptide (see, for example, Richardson, J. P. Crit 
Rev. Biochem. MoL Biol 28:1-30 (1993); Yonaha M. & Proudfoot, N. J. EMBO J, 19: 
3770-3777 (2000); Ashfield, R. et aL, EMBO J. 10: 4197-4207 (1991); Hirose, Y. & 
Manley, J. L. Nature 395: 93-96 (1998)). 

1 0 The nucleic acid binding polypeptides are generally targeted to the cell nucleus so 

that they are able to interact with host cell DNA and bind to the appropriate DNA target 
in the nucleus and regulate transcription. To effect this, a nuclear localisation sequence 
(NLS) is incorporated in frame with the expressible nucleic acid binding polypeptide 
(e.g., zinc finger polypeptide) gene construct. The NLS can be fused either 5' or 3' to the 

15 sequence encoding the binding protein, but preferably it is fused to the C-terminus of the 
chimeric polypeptide. 

The NLS of the wild-type Simian Virus 40 Large T- Antigen (Kalderon et aL 
(1984) Cell 37: 801-813; and Markland et aL (1987) Mol. Cell. Biol. 7: 4255-4265) is an 
appropriate NLS and provides an effective nuclear localisation mechanism in animals. 
20 However, several alternative NLSs are known in the art and can be used instead of the 
SV40 NLS sequence. These include the NLSs of TGA-1A and TGA-1B. 

Composite binding polypeptides can comprise tag sequences to facilitate studies 
and/or preparation of such molecules. Tag sequences may include FLAG-tags, myc-tags, 
25 6his-tags, hemagglutinin tags or any other suitable tag known in the art. 

Moreover, the nucleic acid binding protein gene according to the invention 
preferably includes a secretion sequence in order to facilitate secretion of the polypeptide 
from bacterial hosts, such that it will be produced as a soluble native peptide rather than 
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in an inclusion body. The peptide may be recovered from the bacterial periplasmic space, 
or the culture medium, as appropriate. 

Construction of vectors employs conventional ligation techniques. Isolated 
5 plasmids or DNA fragments are cleaved, tailored, and religated in the form desired to 
generate the plasmids required. If desired, analysis to confirm correct sequences in the 
constructed plasmids is performed in a known fashion. Suitable methods for constructing 
expression vectors, preparing in vitro transcripts, introducing DNA into host cells, and 
performing analyses for assessing nucleic acid binding protein expression and function 

10 are known to those skilled in the art. Gene presence, amplification and / or expression 
may be measured in a sample directly, for example, by conventional Southern blotting, 
Northern blotting to quantify the transcription of mRNA, dot blotting (DNA or RNA 
analysis), or in situ hybridisation, using an appropriately labelled probe which may be 
based on a sequence provided herein. Those skilled in the art will readily envisage how 

1 5 these methods may be modified, if desired. 

f. Applications of Composite Binding Polypeptides 

20 Nucleic acid binding proteins according to the invention can be employed in a wide 

variety of applications, including diagnostics and as research tools* and also in therapeutic 
applications and in transgenic organisms. 

In Vitro Applications 

25 

Poly-zinc finger peptides of this invention may be employed as diagnostic tools for 
identifying the presence of nucleic acid molecules in a complex mixture. Nucleic acid 
binding molecules according to the invention can differentiate single base pair changes in 
target nucleic acid molecules. 

30 
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Accordingly, the invention provides methods for determining the presence of a target 
nucleic acid molecule, wherein the target nucleic acid molecule comprises a target 
sequence, comprising the steps of: 

5 a) preparing a nucleic acid binding protein, by a method set forth above, which is specific 
for the target nucleic acid sequence; 

b) exposing a test system to the nucleic acid binding protein under conditions which 
promote binding of the protein to the target sequence, and removing any nucleic acid 
binding protein which remains unbound; 

10 c) testing for the presence of the nucleic acid binding protein in the test system; 

wherein, if the nucleic acid binding protein is detected, the target nucleic acid molecule is 
present and, if the nucleic acid binding protein is not detected, the target nucleic acid 
molecule is not present. In additional embodiments, quantitation of the amount of nucleic 
acid binding protein allows quantitation of the amount of the target nucleic acid molecule 

1 5 present in the test system. 

In a preferred embodiment, the nucleic acid binding molecules of the invention can be 
incorporated into an ELISA assay. For example, phage displaying composite binding 
polypeptides can be used to detect the presence of the target nucleic acid, and visualised 
20 using enzyme-linked anti-phage antibodies. 

Further improvements to the use of phage expressing a composite binding polypeptide for 
diagnosis can be made, for example, by co-expressing a marker protein fused to the minor 
coat protein (gVIII) of a filamentous bacteriophage. Since detection with an anti-phage 

25 antibody would then be unnecessary, the time and cost of each diagnosis would be further 
reduced. Depending on the requirements, suitable markers for display might include 
fluorescent proteins (A. B. Cubitt, et ah, (1995) Trends Biochem Set 20, 448-455; T. T. 
Yang, et ah, (1996) Gene 173, 19-23), or an enzyme such as alkaline phosphatase (J. 
McCafferty, R. H. Jackson, D. J. Chiswell, (1991) Protein Engineering 4, 955-961). 

30 Labelling different types of diagnostic phage with distinct markers would allow multiplex 
screening of a single nucleic acid sample. Nevertheless, even in the absence of such 
refinements, the basic ELISA technique is reliable, fast, simple and particularly 
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inexpensive. Moreover it requires no specialised apparatus, nor does it employ hazardous 
reagents such as radioactive isotopes, making it amenable to routine use in the clinic. The 
major advantage of the protocol is that it obviates the requirement for gel electrophoresis, 
and so opens the way to automated nucleic acid diagnosis. 

5 

The invention provides nucleic acid binding proteins that have exquisite specificity. The 
invention lends itself, therefore, to the design of any molecule of which specific nucleic 
acid binding is required. For example, the proteins according to the invention may be 
employed in the manufacture of chimeric restriction enzymes, in which a nucleic acid 
1 0 cleaving domain is fused to a nucleic acid binding domain comprising a zinc finger as 
described herein. 

In Vivo Applications 

15 The invention further provides composite binding polypeptides (and nucleic acids 

encoding them) that may be used in transgenic organisms (such as non-human animals), 
as therapeutic agents, and in gene therapy applications. 

A transgenic animal is an animal, preferably a non-human animal, containing at least one 
foreign gene, called a transgene, in its genetic material. Preferably, the transgene is 
20 contained in the animal's germ line such that it can be transmitted to the animal's 
offspring. Transgenic animals may carry the transgene in all their cells or may be 
genetically mosaic. 

Constructs useful for creating transgenic animals according to the invention comprise 
genes encoding nucleic acid binding polypeptides, optionally under the control of nucleic 
25 acid sequences directing their expression in cells of a particular lineage. Alternatively, 
nucleic acid binding polypeptide encoding constructs may be under the control of non- 
lineage-specific promoters, and/or inducibly regulated. Typically, DNA fragments on the 
order of 10 kilobases or less are used to construct a transgenic animal (Reeves, 1998, 
New. Anat, 253:19). A transgenic animal expressing one transgene can be crossed to a 
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second transgenic animal expressing second transgene such that their offspring will carry 
both transgenes. 

Although the majority of previous studies have involved transgenic mice, other species of 
transgenic animal have also been produced, such as rabbits, sheep, pigs (Hammer et al., 
5 1985, Nature 315:680-683; Kumar, et aL, U.S. 05922854; Seebach, et al., U.S. Patent 
No. 6,030,833) and chickens (Salter et al., 1987, Virology 157:236-240). Transgenic 
animals are currently being developed to serve as bioreactors for the production of useful 
pharmaceutical compounds (Van Brunt, 1988, Bio/Technology 6:1149-1154; Wilmut, et 
aL, 1988, New Scientist (July 7 issue) pp. 56-59). .Up-regulation of endogenous or 
1 0 exogenous genes expressing useful polypeptides, such as therapeutic polypeptides, by 
means of a heterologous nucleic acid binding polypeptide, may be used to produce such 
polypeptides in transgenic animals. Preferably, the polypeptides are secreted into an 
extractable fluid, such as blood or mammary fluid (milk), to enable easy isolation of the 
polypeptide. 

15 

Furthermore, the invention provides the use of polypeptide fusions comprising an 
integrase, such as a viral integrase, and a nucleic acid binding protein according to the 
invention to target nucleic acid sequences in vivo (Bushman, (1994) PNAS (USA) 
91 : 923 3 -923 7). In gene therapy applications, the method may be applied to the delivery 
20 of functional genes into defective genes, or the delivery of a heterologous nucleic acid in 
order to disrupt an endogenous gene. Alternatively, genes may be delivered to known, 
repetitive stretches of nucleic acid, such as centromeres, together with an activating 
sequence such as an LCR. This would represent a route to the safe and predictable 
incorporation of nucleic acid into the genome. 

25 

In conventional therapeutic applications, nucleic acid binding proteins according to this 
embodiment may be used to specifically eliminate cells having mutant vital proteins. For 
example, if a mutant ras gene is targeted, cells comprising this mutant gene will be 
destroyed because ras is essential to cellular survival. Alternatively, the action of 
30 transcription factors can be modulated, preferably reduced, by administering to the cell 
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agents which bind to the binding site specific for the transcription factor. For example, 
the activity of HIV tat may be reduced by binding proteins specific for HIV TAR. 

Moreover, binding proteins according to the invention can be coupled to toxic molecules, 
5 such as nucleases, which are capable of causing irreversible nucleic acid damage and cell 
death. Such agents are capable of selectively destroying cells that comprise a mutation in 
their endogenous nucleic acid. 

Nucleic acid binding proteins and derivatives thereof as set forth above may also be 
10 applied to the treatment of infections and the like in the form of organism-specific 

antibiotic or antiviral drugs. In such applications, the binding proteins can be coupled to 
a nuclease or other nuclear toxin and targeted specifically to the nucleic acids of 
microorganisms . 

1 5 Transgenic animals comprising transgenes, optionally integrated within the genome, and 
expressing heterologous zinc finger and other nucleic acid binding polypeptides from 
transgenes, may be created by a variety of methods. Methods for producing transgenic 
animals are known in the art, and are described by Gordon, J. & Ruddle, F.H. Science 
214: 1244-1246 (1981); Jaenisch, R. Proc. Natl. Acad. Set USA 73: 1260-1264 (1976); 

20 Gossler et aL, (1986) Proc. Natl. Acad. Set USA 83:9065-9069; Hogan et aL, 

Manipulating the Mouse Embryo: A Laboratory Manual, (1988); and US. Pat. Nos. 
5,175,384; 5,434,340 and 5,591,669. 

Pharmaceutical Preparations 

25 

The invention likewise relates to pharmaceutical preparations which contain the 
compounds according to the invention or pharmaceutical^ acceptable salts thereof as 
active ingredients, and to processes for their preparation. 



30 



The pharmaceutical preparations according to the invention which contain the compound 
according to the invention or pharmaceutically acceptable salts thereof are those for 
enteral, such as oral, furthermore rectal, and parenteral administration to (a) warm- 
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blooded animal(s), the pharmacological active ingredient being present on its own or 
together with a pharmaceutically acceptable carrier. The daily dose of the active 
ingredient depends on the age and the individual condition and also on the manner of 
administration. 

5 

The novel pharmaceutical preparations contain, for example, from about 10 % to about 
80% (or any integral percentage therebetween), preferably from about 20 % to about 60 
%, of the active ingredient. Pharmaceutical preparations according to the invention for 
enteral or parenteral administration are, for example, those in unit dose forms, such as 

10 sugar-coated tablets, tablets, capsules or suppositories, and furthermore ampoules. These 
are prepared in a manner known per se, for example by means of conventional mixing, 
granulating, sugar-coating, dissolving or lyophilising processes. Thus, pharmaceutical 
preparations for oral use can be obtained by combining the active ingredient with solid 
carriers, if desired granulating a mixture obtained, and processing the mixture or granules, 

15 if desired or necessary, after addition of suitable excipients to give tablets or sugar-coated 
tablet cores. 



Suitable carriers are, in particular, fillers, such as sugars, for example lactose, sucrose, 
mannitol or sorbitol, cellulose preparations and/or calcium phosphates, for example 

20 tricalcium phosphate or calcium hydrogen phosphate, furthermore binders, such as starch 
paste, using, for example, corn, wheat, rice or potato starch, gelatin, tragacanth, 
methylcellulose and/or polyvinylpyrrolidone, if desired, disintegrants, such as the 
abovementioned starches, furthermore carboxymethyl starch, crosslinked 
polyvinylpyrrolidone, agar, alginic acid or a salt thereof, such as sodium alginate; 

25 auxiliaries are primarily glidants, flow-regulators and lubricants, for example silicic acid, 
talc, stearic acid or salts thereof, such as magnesium or calcium stearate, and/or 
polyethylene glycol. Sugar-coated tablet cores are provided with suitable coatings which, 
if desired, are resistant to gastric juice, using, inter alia, concentrated sugar solutions 
which, if desired, contain gum arabic, talc, polyvinylpyrrolidone, polyethylene glycol 

30 and/or titanium dioxide, coating solutions in suitable organic solvents or solvent mixtures 
or, for the preparation of gastric juice-resistant coatings, solutions of suitable cellulose 
preparations, such as acetylcellulose phthalate or hydroxypropylmethylcellulose 
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phthalate. Colorants or pigments, for example to identify or to indicate different doses of 
active ingredient, may be added to the tablets or sugar-coated tablet coatings. 

Other orally utilisable pharmaceutical preparations are hard gelatin capsules, and also soft 
5 closed capsules made of gelatin and a plasticiser, such as glycerol or sorbitol. The hard 
gelatin capsules may contain the active ingredient in the form of granules, for example in 
a mixture with fillers, such as lactose, binders, such as starches, and/or lubricants, such as 
talc or magnesium stearate, and, if desired, stabilisers. In soft capsules, the active 
ingredient is preferably dissolved or suspended in suitable liquids, such as fatty oils, 
10 paraffin oil or liquid polyethylene glycols, it also being possible to add stabilisers. 

Suitable rectally utilisable pharmaceutical preparations are, for example, suppositories, 
which consist of a combination of the active ingredient with a suppository base. Suitable 
suppository bases are, for example, natural or synthetic triglycerides, paraffin 
1 5 hydrocarbons, polyethylene glycols or higher alkanols. Furthermore, gelatin rectal 

capsules which contain a combination of the active ingredient with a base substance may 
also be used. Suitable base substances are, for example, liquid triglycerides, polyethylene 
glycols or paraffin hydrocarbons. 

20 Suitable preparations for parenteral administration are primarily aqueous solutions of an 
active ingredient in water-soluble form, for example a water-soluble salt, and furthermore 
suspensions of the active ingredient, such as appropriate oily injection suspensions, using 
suitable lipophilic solvents or vehicles, such as fatty oils, for example sesame oil, or 
synthetic fatty acid esters, for example ethyl oleate or triglycerides, or aqueous injection 

25 suspensions which contain viscosity-increasing substances, for example sodium 
carboxymethylcellulose, sorbitol and/or dextran, and, if necessary, also stabilisers. 

The dose of the active ingredient depends on the warm-blooded animal species, the age 
and the individual condition and on the manner of administration. For example, an 
30 approximate daily dose of about 10 mg to about 250 mg is to be estimated in the case of 
oral administration for a patient weighing approximately 75 kg . 
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g. Transformation and Transfection 

DNA can be stably incorporated into cells or can be transiently expressed using 
methods known in the art and described below. Stably transfected cells can be prepared 
by transfecting cells with an expression vector containing a selectable marker gene, and 
5 growing the transfected cells under conditions selective for cells expressing the marker 
gene. To prepare transient transfectants, cells are transfected with a reporter gene to 
monitor transfection efficiency. 

There are many well-known methods of introducing foreign nucleic acids into 
host cells, which include electrop oration, calcium phosphate co-precipitation, particle 

10 bombardment, microinjection, naked DNA, liposomes, lipofection, and viral infection etc 
(see, e.g. Sambrook et al. (1989) Molecular Cloning: A Laboratory Manual, Second 
Edition, Cold Spring Harbor Laboratory Press, and Mountain, A. Trends Biotechnol. 18: 
1 19-128 (2000) for a review). Any of the above methods can be used, as long as it is 
compatible with the host cell. Linear nucleic acid molecules have been found to be more 

15 efficiently incorporated into mammalian genomes than circular plasmids. Additionally, 
nucleic acid molecules may be delivered to specific target tissues or to individual cells. 
Viral based gene transfer is often favoured for introducing nucleic acids into mammalian 
cells and specific target tissues, and several viral delivery approaches are in clinical trials 
for gene therapy applications. However, non- viral methods are attractive due to their 

20 greater safety for the purpose of gene transfer to humans. 

The preferred methods of particle bombardment use biolistics made from gold (or 
tungsten). Compared with other transfection procedures, particle bombardment requires a 
low amount of nucleic acid and a smaller number of cells, making the procedure 
generally more efficient (Heiser, W. C. Anal Biochem. 217: 185-196 (1994); Klein, T. M. 
25 & Fitzpatrick-McElligott, S. Curr. Opin. Biotechnol 4: 583-590 (1993)). The procedure 
is particularly suited for organisms that are difficult to transfect, and for introducing DNA 
into organelles, such as mitochondria and chloroplasts. Although generally used for ex 
vivo applications, the procedure is also suitable for in vivo transfection of skin tissue. 
Suitable methods are known in the art and described, for instance, in US Patent Nos. 
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5,489,520 and 5,550,318. See also, Potrykus (1990) Bio/TechnoL 8: 535-542; and 
Finnegan et ah (1994) Bio/TechnoL 12: 883-888. 



Microinjection is a common method of nucleic acid delivery to isolated cells 
(Palmiter, R. D. & Brinster, R. L. Annu. Rev. Genet. 20: 465-499 (1986); Wall, R. J. et 
5 a/., J. CellBiochem. 49: 113-120 (1992); Chan, A. W. et ah, Proa Natl Acad. Set USA 
95: 14028-14033 (1998)). DNA is generally injected into cells and the cells may then be 
re-introduced into animals. Procedures for such a technique are described in US Pat. Nos. 
5,175,384 and 5,434,340, and improvements to the technique are described in WO 
00/69257. 



10 Efficient for gene transfer in vivo can be obtained following local injection of 

naked DNA. While expression of injected DNA in skin lasts for only a few days, injected 
DNA in mouse skeletal muscle has been shown to last for up to nine months (Wolff, J. A. 
et aL, Hum. Mol. Genet: I: 363-369 (1992)). Naked DNA is particularly suited to gene 
therapy for preventive and therapeutic vaccines. 

15 Cationic liposomes containing cholesterol are particularly suited for delivery of 

nucleic acids to humans as they are biodegradable and stable in the bloodstream. 
Liposomes can be injected intravenously, subcutaneously or inhaled as an aerosol. 
Stribling et al. (1992) Proc. Natl. Acad. Sci. USA 89:11,277-11,281. Liposomes can be 
targeted to certain cell types by incorporating ligands, receptors or antibodies 

20 (immunolipids) into the lipid membrane (US. Pat. No. 4,957,773). On contacting target 
cells, entry of DNA from liposomes is via endocytosis and diffusion. Preparations of 
lipid formulations are commercially available and methods for their use are well 
documented (Bogdanenko, E. V. et al., Vopr. Med. Khim. 46: 226-245 (2000); Natsume, 
A. et al, Gene Ther. 6: 1626-1633 (1999)). 

25 Uptake of DNA into animal cells can also be enhanced by using transfection 

agents. "Transfecting agent", as utilised herein, means a composition of matter added to 
the genetic material for enhancing the uptake of exogenous DNA segment (s) into a 
eukaryotic cell, preferably a mammalian cell, and more preferably a mammalian germ 
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cell. The enhancement is measured relative to the uptake in the absence of the 
transfecting agent. Examples of transfecting agents include adenovirus-transferrin- 
polylysine-DNA complexes. These complexes generally augment the uptake of DNA into 
the cell and reduce its breakdown during its passage through the cytoplasm to the nucleus 
5 of the cell. These complexes can be targeted to the male germ cells using specific ligands 
which are recognised by receptors on the cell surface of the germ cell, such as the c-kit 
ligand or modifications thereof. Other preferred transfecting agents include lipofectin™, 
lipofectamine™, DIMRIE C, Superfect, and Effectin (Qiagen), unifectin, maxifectin, 
DOTMA, DOGS (Transfectam; dioctadecylamidoglycylspermine), DOPE (1,2-dioleoyl- 

10 sn-glycero-3 phosphoethanolamine), DOTAP (l ? 2-dioleoyl-3-trimethylarnmonium 
propane), DDAB (dimethyl dioctadecylammonium bromide), DHDEAB (N, N-di-n- 
hexadecyl-N, N-dihydroxyethyl ammonium bromide), HDEAB (N-n-hexadecylN, N 
dihydroxyethylammonium bromide), polybrene, or poly (ethylenimine) (PEI). For 
example, Banerjee, R. et aL, Novel series of non-glycerol-based cationic transfection 

15 lipids for use in liposomal gene delivery,, J. Med. Chem. 42 (21): 4292-99 (1999); 
Godbey, W. T. et al. 9 Improved packing of poly (ethylenimine) -DNA complexes 
increases transfection efficiency, Gene Titer. 6 (8): 1380-88 (1999); Kichler, A et aL 9 
Influence of the DNA complexation medium on the transfection efficiency of 
lipospermine/DNA particles, Gene Ther. 5 (6): 855-60 (1998); Birchaa, J. C. et aL 9 

20 Physico-chemical characterisation and transfection efficiency of lipid-based gene delivery 
complexes, Int. J. Pharm. 183 (2): 195-207 (1999). These non-viral agents have the 
advantage that they facilitate stable integration of xenogeneic DNA sequences into the 
vertebrate genome, without size restrictions commonly associated with virus-derived 
transfecting agents. 



25 The most critical issues for applications such as gene therapy are the efficient 

delivery and appropriate expression of transgenes in host cells. For this purpose, viral 
systems are particularly well suited as viruses have evolved to efficiently cross the plasma 
membrane of eukaryotic cells and express their nucleic acids in host cells. Suitability of 
viral vectors is assessed primarily on their ability to carry foreign nucleic acids and 

30 deliver and express transgenes with high efficiency. Current applications utilise both 
RNA and DNA virus based systems, and 70% of gene therapy trials use viral vectors 
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derived from retroviruses, adenovirus, adeno-associated virus, herpesvirus and pox virus. 
See, for example, Flotte et al. (1995) Gene Ther. 2:357-362; Glorioso et al (1995) Ann. 
Rev. Microbiol 49:675-710; Smith (1995) Ann. Rev. Microbiol 49:807-838; Prince 
(1998) Pathology 30:335-347; and Robbins et al (1998) Pharmacol Ther. 80:35-47. 
5 Retroviruses represent the most prominent gene delivery system as they mediate high 
gene transfer and expression of therapeutic genes. Members of the DNA virus family 
such as adenovirus, adeno-associated virus or herpesvirus are popular due to their 
efficiency of gene delivery. Adenoviral vectors are particularly suited when transient 
transfection of nucleic acid is preferred. Retroviruses express particular envelope 

10 proteins that bind to specific cell surface receptors on host cells, in order for the virus to 
enter the cell. Hence, the type of viral vector used should be determined by the tissue type 
to be targeted. See e.g., Dornburg (1995) Gene Ther. 2:301-310; Gunzburg, et al. (1996) 
J. Mbl Med. 74:171-182; Vileetal (1996) Mol. Biotechnol 5:139-158; Miller (1997) 
"Development and Applications of Retroviral Vectors" Cold Spring Harbor Laboratory 

15 Press, Cold Spring Harbor, New York; Karavanas et al. (1998) Crit. Rev. Oncol. 

Hematol 28:7-30; Hu et al (2000) Pharmacol Rev. 52: 493-51 1; and Walther et al. 
(2000) Drags 60: 249-271 for reviews. 

Safety is a critical issue for viral.based gene delivery because most viruses are 
either pathogens or have pathogenic potential. Generally, when a replication-competent 

20 virus infects an animal cell it can express viral genes and release many new infectious 
viral particles in the host organism. Hence, it is very important that during transgene 
delivery the host animal does not receive a pathogenic virus with full replication 
potential. For this reason, viral-host cell systems have been developed for gene therapy 
treatments to prevent the creation of replication-competent viruses. In this method, viral 

25 components are divided between a vector and a helper construct to limit the ability of the 
virus to replicate (Miller 1997). The viral vector contains the gene(s) of interest and cis- 
acting elements that allow gene expression and replication, but contain deletions of some 
or all of the viral proteins. Helper cells (or occasionally, helper virus) are engineered to 
express the viral proteins needed to propagate the viral vectors. These new viral particles 

30 are able to infect target cells, reverse transcribe the vector RNA and integrate its DNA 
copy into the genome of the host, which can then be expressed. However, the vector can 
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not express the viral proteins required to create new infectious particles. Helper cell lines 
are known in the art (see Hu 5 W-S & Pathak 5 V. K. Pharmacol Rev. 52: 493-511 (2000), 
for a review). 

In general, retroviral vectors are able to package reasonably long stretches of 

■ 

5 foreign DNA (up to 10 kb). Oncoviruses are a type of retrovirus, which only infect 
rapidly dividing cells. For this reason they are especially attractive for cancer therapy. 
Murine leukaemia virus (MLV)-based vectors are the most commonly used of this class. 
Spleen necrosis virus (SNV), Rous sarcoma virus and avian leukosis virus are other types. 
Lentiviral vectors are retroviral vectors that can be propagated to produce high viral titres 

10 and are able to infect non-dividing cells. They are more complex than oncoviruses and 
require regulation of their replication cycle. Lentiviral vectors which may be used 
include human immunodeficiency virus (HIV-1 and -2) and simian immunodeficiency 
virus (SIV) based systems. HIV infects cells of the immune system, most importantly 
CD4 + T-lymphocytes, and so may be useful for targeted gene therapy of this cell type. 

15 Another type of retrovirus is the spumavirus. Spumaviruses are attractive because of 
their apparent lack of toxicity. Linial (1999) J. Virol. 13:1141-1155. 

Adenoviral vectors have high transduction efficiency and are able to transfect a 
number of different cell types, including non-dividing cells. They have a high capacity for 
foreign DNA and can carry up to 30 kb of non-viral DNA (for a review see, Kochanek, S. 

20 Hum. Gene Ther. 10: 2451-2459 (1999)). Recombinant adenoviral (rAd) vectors are 

becoming one of the most powerful gene delivery systems available and have been used 
to deliver DNA to post-mitotic neurons of the central nervous system (CNS) (Geddes, B. 
J. et al, Front. NeuroendocrinoL 20: 296-316 (1999), and are used to treat diseases such 
as colon cancer (Alvarez et al. 9 Hum. Gene Ther. 5: 597-613 (1997). Adeno-associated 

25 virus (AAV) vectors and recombinant AAV (rAAV) vectors are proving themselves to be 
safe and efficacious for the long-term expression of proteins to correct genetic disease. 
Snyder, R. O. J. (Gene. Med. 1: 166-175 (1999)) provides a review of gene delivery 
approaches using such vectors. Construction of such vectors is described in, for example, 
Samulski et aL, J. Virol. 63: 3822-3828 (1989), and US. Pat. No. 5,173,414. 
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Many gene therapy trials have been conducted and are underway (over 3,500 
people have been treated with gene therapy systems), and several reviews can be studied 
for details of the protocols and results (Hwu & Rosenberg, Ann N Y Acad Sci. 1994 May 
31;716:188-97; Blaese, Hosp Pract (Off Ed). 1995 Nov 15;30(ll):33-40; Blaese, Hosp 
5 Pract (Off Ed). 1995 Dec 15;30(12):37-45; Breau & dayman, Curr Opin Oncol. 1996 
May; 8(3):227-31; Dunbar Annu Rev Med. 1996;47:11-206; Lotze Cancer J Sci Am. 
1996 Mar;2(2):63). The first gene therapy trial was carried out by Blaese et aL 9 (1995), to 
correct a genetic disorder known as adenosine deaminase (ADA) deficiency, which leads 
to severe immunodeficiency. Several cancer gene therapy strategies are being developed, 

10 which involve eliminating cancer cells by suicide therapy (Oldfield et al. 9 Hum Gene 
Ther. 1993 Feb;4(l):39-69), modification of cancer cells to promote immune responses 
(Lotze et aL 9 Hum Gene Ther. 1994 Jan;5(l):41-55), and reversion by delivery of a tumor 
suppressor gene (Roth et al. 9 Hum Gene Ther. 1996 May l;7(7):861-74). Another 
successful gene therapy trial has been conducted to combat graft- versus-host disease, 

1 5 which can result following transplant procedures such as bone marrow transplants 

(Bonini et al 9 Science. 1997 Jun 1 3 ;276(53 19): 1719-24). This procedure was carried out 
using an HSV-based vector. Several gene therapy treatments are under investigation for 
the treatment of HIV- 1 infection. Most treatments involve modification of lymphocytes, 
ex vivo 9 to suppress the expression of viral genes, by means of ribozymes, antisense RNA, 

20 mutant trans-dominant regulatory proteins and modification to elicit a host immune 

response (Nabel et aL, Cardiovasc Res. 1994 Apr;28(4):445-55; Galpin et aL 9 Hum Gene 
Ther. 1994 Aug;5(8):997-1017; Morgan RA, Walker R. Hum Gene Ther 1996 Jun 
20;7(10):1281-306 Gene therapy for AIDS using retroviral mediated gene transfer to 
deliver HIV-1 antisense TAR and transdominant Rev protein genes to syngeneic 

25 lymphocytes in HIV-1 infected identical twins; Wong-Staal et aL 9 Hum Gene Ther. 1998 
Nov l;9(16):2407-25). Vectors currently in use for gene therapy treatments and animal 
tests include those derived from Moloney murine leukemia virus, such as MFG and 
derivative thereof, and the MSCV retroviral expression system (Clontech, Palo Alto, 
California). Many other vectors are also commercially available. 

30 Viral vectors are especially important in applications when a specific tissue type is 

to be targeted, such as for gene therapy applications. There are two available methods for 
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targeting genes to specific cell or tissue type. One strategy is designed to control 
expression of the required gene using a tissue specific promoter (discussed above), and 
another strategy is to control viral entry into cells. Viruses tend to enter specific cell types 
according to the envelope proteins that they express. However, by engineering the 
5 envelope proteins to express specific proteins as fusions, such as erythropoietin, insulin- 
like growth factor I and single chain variable fragment antibodies, viral vectors can be 
targeted to specific cell-types (Kasahara et aL, Science. 1994 Nov 25 ;266(5 189): 1373-6; 
Somia et aL, Proc Natl Acad Sci USA. 1995 Aug l;92(16):7570-4; Jiang et aL, J Virol. 
1998 Dec;72(12):10148-56; Chadwick et aL, J Mol Biol. 1999 Jan 15;285(2):485-94). 

10 In one example of tissue specific targeting in transgenic mice, a novel transgene 

delivery system has been developed in which the target tissue type expresses an avian 
viral receptor (TV A), under the control of a tissue specific promoter. Transgenic mice 
expressing the TVA receptor are then infected with avian leukosis virus, carrying the 
transgene(s) of interest (Fisher, G. H. et aL, Oncogene 18: 5253-5260 (1999). 

15 h. Construction of Zinc Finger libraries 

Zinc finger libraries may be constructed from naturally-occurring human zinc 
finger modules. Thus, the invention provides libraries of zinc finger modules. Module 
libraries according to the invention may be assembled combinatorially into zinc finger 
polypeptides. The combinatorial assembly may be carried out biologically, using random 

20 assembly and selection technologies, or in a directed manner under computer control, 

assembling desired modules to produce zinc fingers having defined or random specificity. 
In accordance with the invention, libraries may be constructed entirely from natural zinc 
finger polypeptide modules from which zinc finger polypeptides having any desired 
specificity may be isolated. The invention, in its most preferred aspect, does not require 

25 the engineering of the specificity of any zinc finger module in order to produce a zinc 
finger polypeptide having specificity for any desired nucleic acid sequence. 

Selection of appropriate zinc finger modules for assembly into libraries of 
composite binding polypeptides having a predetermined binding specificity can be 
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accomplished by applying the rules for zinc finger binding specificity set forth herein. In 
the case of zinc finger assembly under computer control, a rule table may be used to 
select zinc fingers for binding to the target site. Figure 1 shows a flowchart depicting part 
of the logic used in the selection of zinc fingers from a natural library in accordance with 
the invention. The logic set forth in Figure 1 may be supplemented, for example using 
Rules relating to zinc finger overlap. Functional testing of zinc fingers for binding to the 
desired binding site may be implemented in an automated fashion and integrated with the 
zinc finger design system. 

The invention thus provides libraries of zinc finger modules. In one embodiment, 
the modules are human zinc finger modules. Preferably, the modules are DNA-binding 
zinc finger modules. 

In a preferred aspect the invention provides a library of DNA-binding human zinc 
finger modules as set out in Example 1 below. Moreover, the invention provides a library 
of human zinc finger modules as set forth in Example 2 below. Sub-libraries can be 
prepared from either of the libraries of the invention. 

The invention furthermore encompasses libraries in which zinc finger modules as 
set forth in Examples 1 or 2 herein are combined with other zinc finger modules to 
provide further libraries that may be used to generate zinc finger polypeptides. 

In a still further aspect, the invention relates to libraries derived from animals 
other than humans, for use in said organisms in order to derive some or all of the same 
advantages as may be obtained with human zinc fingers for use in humans. Example 3 
sets forth databases of zinc fingers from mouse, chicken and plants. Sequences of zinc 
fingers can be identified in other organisms by the same means, i.e. by analysis of 
sequence information and identification of zinc fingers in accordance with the guidance 
given herein. 
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EXAMPLES 

Example 1. List of selected human DNA-binding zinc fingers. 

These fingers have been selected from the human genome on the basis of a prediction that 
5 they have a DNA-binding potential. This prediction is based on coded contacts (WO 
96/06166, WO 98/53057, WO 98/53058; WO 98/53059 and WO 98/53060); 
accordingly, for each peptide unit, a 3 -nucleotide DNA target subsite is shown, as the 
preferred sequence to which the zinc finger binds. Hence, by constructing 2- or 3-finger 
libraries from these 200 or so units, in the manner described in the Examples infra, there 
10 exists the potential to screen a large variety of novel DNA target sites. Note that the 

predicted DNA target subsites listed below are merely intended to be a guide to the DNA- 
binding potential. It is anticipated that, in practice, an even wider range of DNA 
sequences can be targeted using a library engineered from this database, through the 
exertion of a positive selection pressure in the library screening system. 

15 

The fingers listed below are in a format that can be linked with classical wild-type 
canonical "TGEKP" (SEQ ID NO:3) linkers (i.e. . . .TGEKP - zinc finger peptide 
sequence - TGEKP — zinc finger peptide sequence - TGEKP - etc. . For each peptide 
sequence, an oligonucleotide is designed to encode the peptide sequence; the 
20 oligonucleotide can then be linked into a library selection system, as described in the 
Examples infra. 

Database of predicted human DNA-binding zinc fingers 

25 227 finger units 



Zinc finger 


DNA site 


SEQ ID 


Peptide sequence 






NO 










ZIF268 Fl 


GCG 


31 


YACP VE S CDRRF S RSDELTRH I R I H 


ZIF268 F2 


TGG 


32 


FQCRICMRNFSRSDHLSTHIRTH 


ZIF268 F3 


GCG 


33 


FACDICGRKFARSDERKRHTKIH 


Kr-likel3 


NGT 


34 


HKCHYAGCEKVYGKSSHLKAHLRTH 


MAZ Fl 


AGG 


35 


YQCPVCQQRFKRKDRMSYHVRSH 
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MAZ F2 


TGG 


36 


YNC SHCGKS F SRPDHLNSHVRQVH 


MAZ F3 


NGT 


37 


FKCEKCEAAFATKDRLRAHTVRH 


TIEG2 (SP1) F3 


GGG 


38 


FVCPVCDRRFMRSDHLTKHARRH 


SP1 Fl 


GGG 


39 


HKCHYAGCEKVYGKS SHLKAHLRTH 


SP1 F2 


GGG 


40 


FAC S WQDCNKKFAR S DEL ARH YRTH 


SP1 F3 


GGG 


41 


FSCPICEKRFMRSDHIiTKHARRH 


WT1 Fl 


TGT 


42 


FMCAYPGCNKRYFKLSHLQMHSRKH 


WT1 F2 


GAG 


43 


YQCDFKDCERRFSRSDQLKRHQRRH 


WT1 F3 


TGG 


44 


FQCKTCQRKFSRSDHLKTHTRTH 


WT1 F4 


GCG 


45 


F S CRWP S CQKKFARSDELVRHHNMH 


TYY1 


TAT 


46 


FQCTFEGCGKRFSLDFNLRTHVRIH 


TYY1 


NAA 


47 


YVC PFDGCNKKFAQ S TNLKS HILTH 


TF3A 


GGG 


48 


FVCD YEGCGKAF I RD YHLSRHI LTH 


TF3A 


GGC 


49 


FKCTQEGCGKHFASPSKLKRHAKAH 


MAZ 


GGC 


50 


HACEMCGKAFRDVYHLNRHKLSH 


GLI1 


GCA 


51 


YMCEHEGCSKAFSNASDRAKHQNRTH 


ZIC3 


GCA 


52 


FKCEFEGCDRRFANSSDRKKHMHVH 


SP4 


NGG 


53 


HI CHI EGCGKVYGKTSHLRAHLRWH 


SP2 


NTG 


54 


HVCHI PDCGKTFRKTSLLRAHVRLH 


BTE1 


NGG 


55 


HKC P Y S GCGKVYGKS S HLKAH YRVH 


GLI 2 


TAG 


56 


HKCTFEGCS KAYSRLENLKTHLRSH 


Q14872 


TAT 


57 


YQCTFEGCPRTYSTAGNLRTHQKTH 


Q14872 


TGC 


58 


FRCDHDGCGKAFAASHHLKTHVRTH 


ZIC3 


TAG 


59 


FPCPFPGCGKIFARSENLKIHKRTH 


Z143 


CTT 


60 


FKCPFEGCGRSFTTSNIRKVHVRTH 


Z143 


CGT 


61 


FRCEYDGCGKLYTTAHHLKVHERSH 


000153 


AAT 


62 


FMCHE S GCGKQFTTAGNLKNHRRI H 


Z143 


AAC 


63 


Y YC TE PGCGRAFA S ATN YKNH VR I H 


Q14872 


TCT 


64 


FVCNQEGCGKAFLTSHSLRIHVRVH 


000153 


TGT 


65 


F I CPAEGCGKS FYVLQRLKVHMRTH 


Q14872 


GCT 


66 


FNCESEGCSKYFTTLSDLRKHIRTH 


Z143 


GCT 


67 


YRCSEDNCTKSFKTSGDLQKHIRTH 


BTE 1 


GCG 


68 


FPCTWPDCLKKFSRSDELTRHYRTH 


015391 


TAA 


69 


F VC P FD VCNRKF AQ S TNLKTH I LTH 


Z143 


GNC 


70 


YVCTVPGCDKRFTEYS S LYKHHWH 


043591 


GGT 


71 


HVCEHCNAAFRTNYHLQRHVF IH 


BCL6 


TAG 


72 


YRCNI CGAQ FNRP ANLKTHTR I H 


075626 


TAG 


73 


HECQVCHKRFSSTSNLKTHLRLH 


075626 


YAA 


74 


YE CNVCAKTFGQL SNLKVHLRVH 


BCL6 


NGA 


75 


YKCETCGARFVQVAHLRAHVL I H 
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GGA 


7 6 FKCQTCNKGFTQLAHLQKHYLVH 


ZN4 5 


N (N/T) A 


7 7 YRCDVCGKRFRQRS YLQAHQRVH 


BCL6 


YTY 


7 8 YPCEI CGTRFRHLQTLKSHLRIH 


GFI1 


GCA 


7 9 YP CQ YCGKR FHQKS DMKKHTF I H 


Z263 


GAN 


8 0 YQCNI CGKCFSCNSNLHRHQRTH 


ZN75 


TAY 


8 1 YRC S VICGKS F S HNTNLHTHQR I H 


Z186 


TTT (YYY) 


82 YKCIECGKTFTVNQLLTLHHRTH 


Z136 


TTT (YYY) 


83 FKCKQCGKAFSCSPTLRIHERTH 


Z136 


TGA 


8 4 YKCKVCGKAFDYPSRFRTHERSH 


Z136 


TTT (YYY) 


8 5 YKCKVCGKPFHSLSSFQVHERIH 


Z177 


TTA 


8 6 YE CKE CGKAFRNS S CLRVH VRTH 


Z136 


TNN 


8 7 FECKRCGKAFRS S S S FRLHERTH 


060765 


A/T-YT 


8 8 YRCNECGKGFTS I SRLNRHRI IH 


ZN42 


TYT 


8 9 YHCGECGLGFTQVSRLTEHQRIH 


ZN4 2 


CGG 


9 0 FVCGDCGQGFVRSARLEEHRRVH 


014913 


TCG 


91 YKCEKCGKGFFRSSDLQHHQKIH 


014913 


C-G/T-G 


92 YKCEECGKGFSRSSKLQEHQTIH 


ZN4 5 


YYC 


9 3 YKCEECGKGFCRASNLLDHQRGH 


ZN4 5 


AAA 


94 YKCEECGKGFSQASNLLAHQRGH 


ZN4 5 


NAG 


9 5 YQCEECGKGFCRASNFLAHRGVH 


Z239 


YYG 


9 6 YKCEQCGKGFTRS S SLL I HQAVH 


094892 


YNY 


9 7 YRC SECGKGFIVNS GLMLHQRTH 


ZN4 5 


^\-A."Y 


9 8 YQCAECGKGFSVGSQLQAHQRCH 


ZN4 5 


NGY 


9 9 YKCEECGKGFSVGSHLQAHQISH 


ZN45 


YCG 


10 0 YQCDACGKGFSRSSDFNIHFRVH 


ZN4 5 


CCG 


101 YKCGTCGKGFSRSSDLNVHCRIH 


ZN4 5 


TGA 


102 YKCNACGKSFSYS SHLNIHCRIH 


Z239 


TC-.A. 


103 YQC YECGKGFSQS SDLRIHLRVH . 


Z239 


YAA 


104 YKCGECGKGFSQSSNLHIHRCIH 


Z239 


YGA 


105 YKCDKCGKGFSQSSKLHIHQRVH 


Z239 


CGA 


10 6 YHCGKCGKGF SQSS KLL I HQRVH 


060765 


AYA 


10 7 FKCSECGRAFSQSASLIQHERIH 


060792 


GYY 


108 YE CKE CGKAF IRSSS LAKHER I H 


ZN0 7 


^\.T^\. 


10 9 Y P C KE CGKAF SQSS TLAQHQRMH 


043296 


AYY 


110 YKCSECGKAFSRSSSLTQHQRMH 


Z134 


ATG 


111 YKCS E CGKAF S RKDTLVQHQR I H 


Z134 


ATG 


112 YECSEC GKAF S RKATL VQHQR I H 


ZN8 4 




113 YECSECGKAFSEKLSLTNHQRIH 


Z191 


AYG 


114 YGCVECGKAF SRS S I LVQHQRVH 


ZN24 

... 


ACG 


115 YGCVECGKAF SRS S I LVQHQRVH 



> 
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043338 


GTA 


116 


YVCGQCGKS F S QRATL I KHHRVH 




043339 


GTA 


117 


YECSQ CGKS F S QKATL VKHQR VH 




043338 


AYA 


118 


YDCGQCGKS F I QKS S L I QHQWH 




043339 


ANA 


119 


YE CGQCGKS F S QKS GL I QHQWH 




043338 




120 


YECGECGKSFSQSSNLIEHCRIH 




Q13398 


AAA 


121 


YE CGE CGKS F S QRSNLMQHRRVH 




Z13 5 


0*"^".]?^. 


122 


YE CGE CGKAF SQS TLLTEHRR I H 




Q13398 




123 


YECSECGKSFSQSSSLIQHRRVH 




014709 


AAA 


124 


YKCNE C GKAF S Q S A YLLNHQR 1 H 




014709 


CAA 


125 


YKCNE CGKVF S QNAYL I DHQRLH 




014709 




126 


YKCTECGKAFTQSAYLFDHQRLH 




01470 9 


CAA 


127 


YKCDE CGKT FAQTTYL I DHQRLH 




060792 




128 


YNCNECRKTFSQSTYLIQHQRIH 




015535 


ANA 


129 


YHCKE CGKVFSQ SAGL I QHQRIH 




Q15776 


(a) 


TNA 


130 


YHCKE CGKAF S QNTGL I LHQR I H 




Q15776 


(b) 


TNA 


131 


YQCNQ C GKAF S Q SAGL I LHQ R I H 




Q15776 


CNA 


132 


YKCNE CGRAF S QKS GL I EHQRI H 




ZN84 


AAC 


133 


YG CNE CGRAF S E KSNL I NHQR I H 




Z191 


ANA 


134 


YKCLE C GKAF SQNSGLI NHQR I H 




ZN2 4 


ANA 


135 


YKCLECGKAFSQNSGLINHQRIH 




060765 


•A.Y".^L 


136 


YRCEECGI S FGQS S AL I QHRR I H 




ZN07 


YYA 


137 


YRCEE CGKAFGQS S SL I HHQRIH 




043340 


ACA 


138 


YECDECGKS YSQS SALLQHRRVH 




Z135 


CYY 


139 


YKCQECGKAFSHS S AL I EHHRTH 




043340 


AYA 


140 


YDCSE CGKS FR QVSVL I QHQR VH 




043340 




141 


Y VC S E CGKS FGQKSVL I QHQRVH 




Q13398 


AYT 


142 


YQCSQCGKSFGCKSVLIQHQRVH 




015535 


GNA 


143 


HKCDECGKS FTQSSGLI RHQR I H 




Q15776 


GNA 


144 


HKCDECGKS FAQ S S GLVRHWR I H 




075802 


ANG 


145 


HKCEECGKAFSRSSGLIQHQRIH 




Z189 


ANG 


146 


HKCEECGKAFSRSSGLIQHQRIH 




075802 


ANG 


147 


HKCDE CGKAF S RNS GL I QHQR I H 




Q13398 


YYG 


148 


HE CNE CGKSFSRSSSLI HHRRLH 




Z195 


Y°.A-A. 


149 


YKCDECGKNFTQSSNLIVHKRIH 




043309 


CYA 


150 


YKCDKCGKAFTQRSVLTEHQRIH 




Z195 


CGA 


151 


YKCDECGKAYTQSSHLSEHRRIH 




ZN45 


YYA 


152 


YKCERCGKAF SQF S S LQVHQRVH 




060893 


YYN 


153 


YECEDCGKTF I G S SAL VI HQRVH 




ZN0 7 


TAT 


154 


YECLQCGKAFSMSTQLTIHQRVH 




060893 


CYA 


155 


YECDDCGKTFSQSCSLLEHHKIH 
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Q15776 


NGG 


156 


YECDECGKTFRRS SHL I GHQRSH 




ZN84 


YGG 


157 


YE C GE CGKAF S RKS HL I S HWRTH 




Z177 


YGA 


158 


YE CDH CGKS F S Q S S HLNVHKRTH 




043296 


AYG 


159 


YE CME CGKAFNRKS YLTQHQR I H 




043296 


GNG 


160 


YE C VE CGKAFTRMS GLTRHKR I H 




043340 


AGG 


161 


YECRE CGKS FTRKNHL I QHKTVH 




Z134 


AAG 


162 


YECSECGKTFSRKDNLTQHKRIH 




043338 


CGA 


163 


YECSECGKSFSQTSHLNDHRRIH 




075467 


AGA 


164 


YECAQCGKAFSQTSHLTQHQRIH 




Z135 


AGA 


165 


YECSECGKAFRQSIHLTQHLRIH 




Z135 


AGA 


166 


YE CHD CGKS FRQ S THLTQHRR I H 




Z205 


AGG 


167 


YACTD CGKRFGRS SHL I QHQ 1 1 H 




043296 


AGG 


168 


YECTECGKTFIKSTHLLQHHMIH 




075290 


AAG 


169 


YECKECGKYFSRSANLIQHQSIH 




075290 


AGG 


170 


YE CKE CGKGFNRGAHL I QHQKI H 




075290 


AGG 


171 


YECKE CGKGFNRGAHL I QHQKI H 




060792 


CGA 


172 


YTCNECGKAFSQRGHFMEHQKIH 




075123 


CGA 


173 


YTCDQCGKGFGQSSHLMEHQRIH 




043337 


GYA 


174 


YE CNACGKAF SQSSTLI RHYL I H 




075802 


GYY 


175 


YECNYCGKTF S VS S TL I RHQRIH 




Z165 


GGY 


176 


YECSECGKTFRVSSHLIRHFRIH 




Z124 


CYY 


177 


YVCNNCGKGFRC S S S LRDHERTH 




Z135 


AYY 


178 


YGCNECGKTFSHSSSLSQHERTH 




015361 


GAY 


179 


YD CNHC GKS FNHKTNLNKHE R I H 




075123 


AAA 


180 


YVCNE CGKRF S QT SNFTQHQR I H 




Q13398 


7\ 7\V 


181 


YVCGE CGKS F S H S SNLKNHQRVH 




ZN3 5 


VVA 

JL JL JTx 


182 


YTCNECGKAFRQRSSLTVHQRTH 




Z157 


YYC 


183- 


YECTECGKTFSEKATLTIHQRTH 




043338 


GYY 


184 


YECDECGKAFGSKSTLVRHQRTH 




ZN84 


TYC 


185 


YECSECGKAFGEKSSLATHQRTH 




ZN07 


GAA 


186 


YGCRE CGKAF S Q Q S QLVRHQRTH 




ZN84 


YAA 


187 


YNC S Q CGKAF SQKSQLT SHQRTH 




Z186 


YGY 


188 


YACDHCEKAFSHKSKLTVHQRTH 




043338 


GGC 


189 


YVCGE CGKAFMFKS KLVRHQRTH 




OZF 


"j^'jlT^. 


190 


YE CNVC GKAF S Q S S S LT VH VR S H 




095779 


YYY 


191 


YKC KE CGKAFNHC SLLTI HE RTH 




Z135 


GYY 


192 


YACRDCGKAFTHS S S LTKHQRTH 




ZN8 0 


GYA 


193 


YECKECGKGFYYSYSLTRHTRSH 




Z177 


GYC 


194 


YECSDCGKAFIDQSSLKKHTRSH 




Z177 


GYY 


195 


YDCKECGKAFTVPSSLQKHVRTH 
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043337 


ACT 


196 YD CMAC GKAFRC S S E L, I QHQR I H 


Q14585 


AGY 


197 YE CKE CE KAFRS GS KL I QHQRMH 


Q14585 


.A-A»"Y 


19 8 YECIDCGKAFGSGSNLTQHRRIH 


Q14585 


GYY 


19 9 YECKACGMAFS SGSALTRHQRIH 


Q14585 


AYY 


2 0 0 YE C KE CGKAF Y S G S S LTQHQR I H 


Q14585 


AAY 


2 01 YECKECGKAFGSGANLAYHQRIH 


Q14585 


GAY 


2 02 FECKECGKAFGSGSNLTHHQRIH 


Q14585 


ACY 


2 03 YVCKE CGKAFNS GSDLTQHQR I H 


060792 




2 04 YQCHECGKTFSYGSSLIQHRKIH 


060893 


GNA 


2 05 HYCHE CGKS FAQS S GLTKHRRI H 


Z165 


GCC 


2 0 6 YECNECGKS FAE S SDLTRHRR I H 


060893 


GAY 


2 0 7 YE CEECGKVFSHS SNL I KHQRTH 


Q15776 


NGY 


2 08 YECNECGKAFSHS SHL I GHQRIH 


Z135 


GYY 


2 0 9 YQCGECGKAFSHSSSLTKHQRIH 


Z165 


GGY 


210 HQCNE CGKAFRHS S KLARHQR I H 


Z135 


TYG 


211 YECHECLKGFRNSSALTKHQRIH 


043361 


YGC 


212 YE CNE C GKF F LD S Y KLV I HQR I H 


043361 


YGC 


213 YECSECGKFFRDSYKLI IHQRVH 


Z140 


YYG 


2 14 YGCHECGKTFGRRFSLVLHQRTH 


060792 


AAA 


215 YECNECGKAFSQHSNLTQHQKTH 


Z135 


ANA 


216 YKCTQCGRTFNQIAPLIQHQRTH ' 


Z135 


ANA 


217 YECNQCGRAFSQLAPLIQHQRIH 


Z135 


ANA 


218 YECHECGKAFTQ ITPLI QHQRTH 


043309 


AGA 


219 Y KCNE C GKAF GRW S ALNQHQRLH 


ZN8 3 


AGA 


22 0 YKCNE C GKVFHNM S HLAQHRR I H 


ZN83 


AGY 


221 YRCNVCGKVFHHI SHLAQHQRIH 


ZN8 3 


AGA 

* 


22 2 Y KCNE C GKVFNQ I SHLAQHQRIH 


014709 


CAY 


223 FECSECGRAFSSNRNLIEHKRIH 


ZN74 


GYA 


2 24 YKCSECGRAFSQNHCLIKHQKIH 


Q13398 


ANA 


225 YE C S E CGKS F S QNF S L I YHQR VH 


075123 


GYA 


22 6 FECKECGKGFSQSSLLIRHQRIH 


Z132 (a) 


GGA 


22 7 FECSECGRDFSQSSHLLRHQKVH 


Z132 


GYA 


2 2 8 YECNECGKFFS QNS I L I KHQKVH 


Z132 (b) 


GGA 


2 2 9 YE CDE CGKAF SNRSHL I RHEKVH 


Z132 


GGN 


23 0 YE C S E CGRAF S S NSHLVRHQRVH 


Z132 


AAA 


2 31 YE C S E C GRAFNNN S NL AQHQ KVH 


Z134 


ATY 


2 32 YKC SDCGKVFRHKS TLVQHE S I H 


075290 


.A-?\. r ][7 


23 3 YECKECGKAFRLYLQLSQHQKTH 


Z157 


AYC 


23 4 YE CGE CGKNFRAKKS LNQHQR I H 


Z157 


TTT 


2 3 5 YE CGE CGKF FRMKMTLNNHQRTH 
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ZN0 7 


AAT 


236 


YE CAE CGKVFRLC S QLNQHQR I H 


2157 


AYT 


237 


YECSECGKIFSMKKSLCQHRRTH 


043361 


GGY 


238 


YE CNKCGKF FMYNS KL I RHQKVH 


043361 


GTY 


239 


YKCSKC GKF FRYRCTL S RHQKVH 


Z157 


CGY 


240 


YE CNECGNAF YVKARL I EHQRMH 


Z157 


CGY 


241 


YE CSE CGNAF YVKVRL I EHQR I H 


075123 


AGG 


242 


FECNE CGKAF IRSSKLI QHQRIH 


ZN0 7 


AGT 


243 


FKC TE C GKAFRL S S KL I QHQR I H 


075123 


GYT 


244 


YECNECGKAFFLSSYLiIRHQKIH 


075802 


AAT 


245 


HKCGECGKAFRLSTYLIQHQKIH 


Z174 


GCG 


RNA 


246 


YKCDDCGKS FTWNSELKRHKRVH 


Z202 


GCG 


RNA 


247 


YRCDDCGKHFRWTSDLVRHQRTH 


043345 


GTG 


RNA 


248 


YKCEECGKAYKWPSTLSYHKKIH 


043345 


CA? 


RNA 


249 


YKCEECGKAFNWS SNLMEHKKIH 


075346 




250 


YRCEECGKAFNQSANLTTHKRIH 


ZN43 




251 


YKCEECGKAFTQS SNLTTHKKIH 


ZN85 


GGA 


252 


YKCEECGKAFNQSSKLTKHKKIH 


ZN8 5 


GAA 


253 


YTCEECGKAFNQSSNLTKHKRIH 


Q02313 


GAA 


254 


YKCEECGKAFNQLSNLTRHKVIH 


Q02313 


CAA 


255 


YKCEECGKAFKQFSNLTDHKKIH 


Z141 


GTG 


256 


YKCEECGKAFNRSTTLTKHKRIH 


ZN91 


TTG 


257 


YKCEECGKAFSRSSTLTKHKTIH 
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Example 2: List of all human C2H2 zinc fingers 

This list represents an even more comprehensive database of human zinc fingers, 
including those with non-DNA-binding activities such as those mediating protein-protein 
interactions and those involved in RNA binding. By including fingers from this database 
5 into a natural finger selection system as disclosed herein, many new zinc finger proteins 
having unique target specificities can be obtained. All of these peptides would 
necessarily possess properties required for potential therapeutic agents, such as non- 
immuno genicity . 

10 The fingers listed below are in a format that can be linked with classical canonical 

"TGEKP" linkers (i.e. . . .TGEKP — zinc finger peptide sequence — TGEKP — zinc finger 
peptide sequence - TGEKP - etc. . .). For each peptide sequence, an oligonucleotide is 
designed to encode the peptide sequence; the oligonucleotide can then be linked into a 
library selection system, as described in the Examples infra. 

15 

Human zinc finger database 

968 finger units 



Name 


' SEQ ID NO 


Peptide sequence 


Q92 98 INHUMAN 


258 


HQCAHCEKTFNRKDHLKNHFQTH 


O76019_HUMAN 


259 


HQCAHCEKTFNRKDHLKNHLQTH 


ZFY__HUMAN 


260 


HRCE YCKKGFRRP S E KNQH I MRH 


ZFX__HUMAN 


261 


HRCE YCKKGFRRP S E KNQH I MRH 


ZFX_BOVIN 


262 


HRCE YCKKGFRRP SEKNQH I MRH 


Q15558JBUMAN 


263 


HRCEYCKKGFRRPSEKNQHIMRH 


ZFXJHUMAN 


264 


HKCDMCDKGFHRPSELKKHVAAH 


ZFY_HUMAN 


265 


HKCEMCEKGFHRPSELKKHVAVH 


Q15558_HUMAN 


266 


HKCEMCEKGFHRPSELKKHVAVH 


Zl 6 INHUMAN 


267 


YTC S VCGKGF S RPDHL S CHVKHVH 


MAZ_HUMAN 


268 


YNCSHCGKSFSRPDHLNSHVRQVH 


043 82 9_HUMAN 


269 


YSCEVCGKSFIRAPDLKKHERVH 


O00403_HUMAN 


270 


Y S CE VCGKS F I RAPDLKKHERVH 


Z151 HUMAN 


271 


HKCPHCDKKFNQVGNLKAHLKIH 


Q92618_HUMAN 


272 


YKCPYCDHRASQKGNLKIHIRSH 


ZFX_HUMAN 


273 


FRCKRCRKGFRQQSELKKHMKTH 


Q1452 6_HUMAN 


274 


YPCTI CGKKFTQRGTMTRHMRSH 


HKR3_HUMAN 


275 


FECTECGYKFTRQAHLRRHMEIH 


Q14 52 6_HUMAN 


276 


' YACDACGMRFTRQYRLTEHMRI H 


075 62 6_HUMAN 


277 


YECNVCAKTFGQLSNLKVHLRVH 


CTCF_HUMAN 


278 


HKC PD CDMAFVT S GEL VRHRRYKH 


075701 HUMAN 


279 


YSCPDCSLRFAYTST1T1AIHRRIH 
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O75701_HUMAN 


280 


YACSDCKSRFTYPYLLAIHQRKH 


043167__HUMAN 


281 


YACKDCGKVFKYNHFLAIHQRSH 


O75850_HUMAN 


282 


CACPDCGRSFTQRAHMLLHQRSH 


O75850_HUMAN 


283 


YACPDCGRGFSHGQHLARHPRVH 


ZN42_HUMAN 


284 


FVCGDCGQGFVRSARLEEHRRVH 


07 54 6 7_HUMAN 


285 


FRCVDCGKAFAKGAVLLSHRRIH 


' O15015_HUMAN 


286 


YKC S E C GRA YRHRGS L VNHRH S H 


075 7 0 1 HUMAN 


287 


YPCPDC GRRFRQRG S L AI HRRAH 


Q92 9 5 INHUMAN 


288 


YECAI CQRSFRNQSNLAVHRRVH 


BCL6_HUMAN 


289 


YKCDRCQAS FRYKGNLASHKTVH 


ZN42_HUMAN 


290 


YACQDCGRRFHQSTKLIQHQRVH 


075 7 0 INHUMAN 


291 


YPCPDCGRRFTYSSLLLSHRRIH 


O75701_HUMAN 


292 


HVCTDCGRRFTYPSLLVSHRRMH 


0757 0 INHUMAN 


293 


HSCPDCGRNFSYPSLLASHQRVH 


ZN42_HUMAN 


294 


YACVECGERFGRRSVLLQHRRVH 


0432 98_HUMAN 


295 


YGCGVCGKKFKMKHHLVGHMKIH 


O152 0 9_HUMAN 


296 


YDCPVCNKKFKMKHHLTEHMKTH 


043 82 9_HUMAN 


297 


YACHMCDKAFKHKSHLKDHERRH 


O0 04 03_HUMAN 


298 


YACHMCDKAFKHKSHLKDHERRH 


O60315__HUMAN 


299 


HQCQICKKAFKHKHHLIEHSRLH 


Q12 924JSUMAN 


300 


HECGI CKKAFKHKHHL I EHMRLH 


NIL2_HUMAN 


301 


HE CG I CKKAFKHKHHL I EHMRLH 


Q12 924_HUMAN 


302 


FKCTECGKAFKYKHHLKEHLRIH 


O60315_HUMAN 


303 


FKCTECGKAFKYKHHLKEHLRIH 


NIL2_HUMAN 


304 


FKCTECGKAFKYKHHLKEHLRIH 


09578 0__HUMAN 


305 


YKCEECGKAFKRCSHLNEHKRVQ 


095779_HUMAN 


306 


YKCEECGKAFKRCSHLNEHKRVQ 


04329 6__HUMAN 


307 


FKC S ECGKVFNKKHLLAGHE KI H 


O14 7 0 9__HUMAN 


308 


YKCKE CGKGF YRHS GL I IHLRRH 


O14 70 9_HUMAN 


309 


HKCKECGKGFIQRSSLLMHLRNH 


ZN8 0_HUMAN 


310 


CKCVECGKVFNRRSHLLCYRQIH 


0433 3 7_HUMAN 


311 


YKC I ECGKAFKRRSHLLQHQRVH 


O60765_HUMAN 


312 


YICKECGKAFTLSTSLYKHLRTH 


Z13 6_HUMAN 


313 


FE CKRCGKAFR S S S S FRLHERTH 


Z13 6_HUMAN 


314 


FVCKQCGKAFRSASTFQIHERTH 


Z13 6J3UMAN 


315 


Y VCKHCGKAF VS STSIRI HERTH 


Z13 6_HUMAN 


316 


FKCKQCGKAFS CS PTLRIHERTH 


Z124_HUMAN 


317 


YVCNNCGKGFRC S S S LRDHERTH 


Z177_HUMAN 


318 


YECKECGKAFRNS S CLRYHVRTH 


Z124_HUMAN 


319 


YECKHCGKAFRYSNCLHYHERTH 


O95780_HUMAN 


320 


YKCKECGKAFNHCSLLTIHERTH 


09577 9_HUMAN 


321 


YKCKECGKAFNHCS LLT I HERTH 


Z124_HUMAN 


322 


YPCKQCGKAFRYAS SLQKHEKTH 


Z13 6_HUMAN 


323 


YECKQCGKAFSYLNSFRTHEMIH 


Z13 6_HUMAN 


324 


YECKQ CGKAF SYLPS LRLHE R I H 


O15 0 6 0_HUMAN 


325 


Y S C KVC GKRFAHT S E FN YHRR I H 


Z13 6_HUMAN 


326 


YKCKVCGKPFHSLSPFRIHERTH 


Z136 HUMAN 


327 


YKCKVCGKPFHSLSSFQVHERIH 
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Z13 6_HUMAN 


328 


YKCKVCGKAFDYPSRFRTHERSH 


ZN3 5_HUMAN 


329 


YVCNE CGKAFT C S SYLL I HQR I H 


015322_HUMAN 


330 


YNCKECGKSFRWSSYLLIHQRIH 


Q92 951_HUMAN 


331 


YRCDQCGKAFSQKGSLIVHIRVH 


Q92 951_HUMAN 


332 


YQCKECGKSFSQRGSLAVHERLH 


Q92951JHUMAN 


333 


YECQECGKSFRQKGSLTLHERIH 


OZF_HUMAN 


334 


YE CNE CGKAF S QRT S L I VH VR I H 


OZF_HUMAN 


335 


YE CNVCGKAF S Q S S S LTVHVRSH 


ZNO 7_HUMAN 


336 


YVCNDCGKAFSQSSSLIYHQRIH 


Z 1 5 1 HUMAN 


337 


CQC VMCGKAFTQAS SL I AHVRQH 


Z177_HUMAN 


338 . 


YDCKECGKAFTVPSSLQKHVRTH 


OZF_HUMAN 


339 


FECKDCGKAF I QKSNL I RHQRTH 


Z 1 7 7_HUMAN 


340 


YECSDCGKAFIDQSSLKKHTRSH 


Z177_HUMAN 


341 


YECSDCGKAFIFQSSLKKHMRSH 


O6 0 7 92_HUMAN 


342 


YECKECGKAFIRSSSLAKHERIH 


Z 1 6 1 HUMAN 


343 


YACTYCS KAFRD S YHLRRHE S CH 


Z 1 6 1 HUMAN 


344 


HACEMCGKAFRDVYHLNRHKLSH 


MAZ__HUMAN 


345 


HACEMC GKAFRD VYHLNRHKL S H 


O60 7 92_HUMAN 


346 


FKCDECDKTFTRSTHLTQHQKIH 


06 0 7 92_HUMAN 


347 


YKCNECDKAFSRSTHLTEHQNTH 


Z2 63_HUMAN 


348 


YKCNECGKSFRQGMHLTRHQRTH 


Z2 63__HUMAN 


349 


HKCLE CGKC F S QNTHLTRHQRTH 


Z13 5_HUMAN 


350 


YE C S Q CGKAFRQ S THLTQHQR I H 


Z13 5_HUMAN 


351 


YE CHD CGKSFRQS THL TQHRR I H 


Z13 5_HUMAN 


352 


YE C S E CGKAFRQ S I HLT QHLR I H 


0754 6 7_HUMAN 


353 


YECAQCGKAFSQTSHLTQHQRIH 


ZNO 7_HUMAN 


354 


YECLQCGKAFSMSTQLTIHQRVH 


O952 7 0_HUMAN 


355 


YPCQFCGKRFHQKSDMKKHTYIH 


GF I INHUMAN 


356 


YPCQYCGKRFHQKSDMKKHTFIH 


O75850_HUMAN 


357 


F PCTE CEKRFRKKTHL I RHQRIH 


Q15552_HUMAN 


358 


FRCDECGMRS IQKYHMERHKRTH 


043591_HUMAN 


359 


FRCDE CGMRF I QKYHMERHKRTH 


Q15552_HUMAN 


360 


FQCS QCDMRF I QKYLLQRHEKI H 


043 5 91_HUMAN 


361 


FQCSQCDMRFIQKYLLQRHEKIH 


O75850_HUMAN 


362 


FPCSECDKRFSKKAHLTRHLRTH 


O75850_HUMAN 


363 


YPCAECGKRFSQKIHLGSHQKTH 


094 8 92_HUMAN 


364 


FMCSE CGKGF TMKRYL IVHQQIH 


043 3 3 6_HUMAN 


365 


YQCS ECGKS F I YKQ S LLDHHR I H 


04 3 1 6 7_HUMAN 


366 


FKCNECGKGFAQKHSLQVHTRMH 


043167_HUMAN 


367 


YTCDQCGKYFSQNRQLKSHYRVH 


PLZFJHUMAN 


368 


YECNGCDKKFSLKHQLETHYRVH 


HKR3_HUMAN 


369 


YACPTCHKKFLSKYYLKVHNRKH 


043336_HUMAN 


370 


YVCNVCGKSFRHKQTFVGHQQRIH 


04 3 3 3 6_HUMAN 


371 


YVCNICGKSFLHKQTLVGHQQRIH 


Z134_HUMAN 


372 


YDCSDCGKSFGHKYTLIKHQRIH 


Z2 0 0__HUMAN 


373 


YD CNH C GKS FNHKTNLNKHER I H 


015361_HUMAN 


374 


YDCNHCGKSFNHKTNLNKHERIH 


ZN84 HUMAN 


375 


YDCNHCGKAFSRKSQLVRHQRTH 
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ZN84 


_HUMAN 


376 


FECRECGKAFSRKSQLVTHHRTH 


ZN0 7' 


__HUMAN 


377 


YGCRE CGKAF SQQS QL VRHQRTH 


ZN84 


_HUMAN 


378 


YRCIECGKAFSQKSQLINHQRTH 


ZN84' 


_HUMAN 


379 


YGCSECRKAFSQKSQLVNHQRIH 


ZN84 


_HUMAN 


380 


HGC I QCGKAF SQKSHL I SHQMTH 


ZN84 


_HUMAN 


381 


YNC S Q CGKAF SQKSQLTS HQRTH 


ZN84 


_HUMAN 


382 


YVCSECGKAFCQKSHLI SHQRTH 


Z157 


_HUMAN 


383 


FECNECGKSFGRKSQLILHTRTH 


ZN84 


_HUMAN 


384 


FECS ECGKAFS RKSHL I PHQRTH 


ZN84 


_HUMAN 


385 


YE C GE CGKAF S RKS HLIS HWRTH 


Z136 


_HUMAN 


386 


YHCKECGKAYSCRASFQRHMLTH 


Z 1 3 6 


_HUMAN 


387 


YECKECGEAFS C IPS MRRHM I KH 


Z136* 


_HUMAN 


388 


■ YECQECGKAFTCITSVRRHMIKH 


ZN8 0 


_HUMAN 


389 


YECQECGKAFPEKVDFVRHMRIH 


0433 


3 8_HUMAN 


390 


Y VCGE CGKAFMF KS KL VRHQRTH 


0433 


3 8_HUMAN 


391 


YECDECGKAFGSKSTLVRHQRTH 


Z133 


_HUMAN 


392 


YACGE CGRGF S QKSNLVAHQRTH 


Z133 


_HUMAN 


393 


YMC S E CGRGF SQKSNL 1 1 HQRTH 


Z 1 3 3 


_HUMAN 


394 


YACKDCGRGFSQQSNLIRHQRTH 


Z 1 3 3 


_HUMAN 


395 


YACSDCGLGFSDRSNLI SHQRTH 


Z133 


_HUMAN 


396 


YACRECGRGFNRKS TL I IHERTH 


Z133 


_HUMAN 


397 


YVCRE CGRGF S HQAGL I RHKRKH 


Z133 


_HUMAN 


398 


CVCRECGQGFLQKSHLTLHQMTH 


Z133 


__HUMAN 


399 


YVCRE CGKGFSQ KS AWRHQRTH 


0948 


92_HUMAN 


400 


Y I C S E C GKGF P RKS NL I VHQRNH 


0948 


92_HUMAN 


401 


Y I CNE CGKGF PGKRNL I VHQRNH 


0948 


92_HUMAN 


402 


YTCSECGKGFPLKSRLIVHQRTH 


0948 


92_HUMAN 


403 


Y I C S E CGKGF TTKHYVI I HQRNH 


0948 


92_HUMAN 


404 


YI CS ECGKGFTGKSML 1 1 HQRTH 


0948 


92_HUMAN 


405 


YLCSECGKGFTVKSMLI I HQRTH 


0948 


92_HUMAN 


406 


YGCNECGKGFTMKSRLIVHQRTH 


0948 


92_HUMAN 


407 


YICNECGKGFTMKSRMIEHQRTH 


0948 


92_HUMAN 


408 


FICSE CGKVF TMKS RL I EHQRTH 


0948 


92_HUMAN 


409 


YI CNECGKGFAFKSNLWHQRTH 


Z186 


_HUMAN 


410 


YECNECGKTFHQKSFLTVHQRTH 


Z186 


__HUMAN 


411 


YECNELGKTFHCKSFLTVHQKTH 


Z186 


_HUMAN 


412 


YGCNECGKTVRCKSFLTLHQRTH 


ZN3 5 


_HUMAN 


413 


YTCNECGKAFRQRSSLTVHQRTH 


Z186 


_HUMAN 


414 


YQCSECGKTFSQKSYLTIHHRTH 


Z157 


_HUMAN 


415 


YECSECGKTFRVKI SLTQHHRTH 


Z186 


_HUMAN 


416 


YKC I E CGKTFT VNQLLTLHHRTH 


Z157 


_HUMAN 


417 


YECTECGKTFSEKATLTIHQRTH 


ZN84 


_HUMAN 


418 


YACSDCRKAFFEKSELIRHQTIH 


ZN84 


__HUMAN 


419 


Y E C S L C RKAF FE KS ELI RHLRTH 


Z140 


_HUMAN 


420 


YE CNE CRKALRCHS FL I KHQR I H 


ZN84 


__HUMAN 


421 


YE CNE CRKAFRE KSSLINHQRIH 


ZN84 


_HUMAN 


422 


YECSECRKAFRERSSLINHQRTH 


ZN84 


HUMAN 


423 


YECS E CGKAFGEKS S LATHQRTH 
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ZN84JHUMAN 424 

0433 3 9_HUMAN 425 

Z15 7_HUMAN 42 6 

Z157_HUMAN 42 7 

Z157_HUMAN 428 

04 3 3 61_HUMAN 42 9 

Z134_HUMAN 430 

Z134_HUMAN 431 ' 

Z134_HUMAN 432 

O14 7 0 9_HUMAN 433 

O14709_HUMAN 434 

O14709_HUMAN 435 

O14709_HUMAN 436 

Z140_HUMAN 437 

Z140_HUMAN 438 

095878_HUMAN 439 

O14709_HUMAN 440 

ZN83_HUMAN '441 

ZN0 7_HUMAN 44 2 

Z13 7_HUMAN 443 

Z140_HUMAN 444 

Z18 9_HUMAN 445 

O7 5 8 02_HUMAN 446 

O14 7 0 9_HUMAN 44 7 

O433 0 9__HUMAN 448 

075123__HUMAN 449 

043336_HUMAN 450 

O43309_HUMAN 451 

O6 0 7 92_HUMAN 452 

04 3 3 0 9_HUMAN 453 

ZN91_HUMAN 454 

ZN 9 INHUMAN 4 55 

ZN9 1 HUMAN 4 56 

ZN91_HUTYIAN 457 

ZN8 5_HUMAN 45 8 

ZN8 5_HUMAN 4 5 9 

ZN43_HUMAN 46 0 

ZN43_HUMAN 461 

ZN43_HUMAN 4 62 

075437_HUMAN 463 

04 3 3 4 5_HUMAN 464 

ZN9 1 HUMAN 4 65 

ZN9 1_HUMAN 4 67 

Q02313_HUMAN 468 

ZN9 1 HUMAN 4 69 

ZN43_HUMAN 47 0 

ZN9 1 HUMAN 471 

Q02313 HUMAN 472 



YECSECGKAFSEKLSLTNHQRIH 
YE C S KCGKAF RGKY S L VQHQRVH 
YECSEC GKI FSMKKSLC QHRRTH 
YECGE CGKF FRMKMTLNNHQRTH 
YECGE CGKNFRAKKS LNQHQR I H 
YKCSECGKAFSLKHNWQHLKIH 
YECSEC GKAF S RKATL VQHQR I H 
YKCSE CGKAF S RKDTL VQHQR I H 
YECSECGKTFSRKDNLTQHKRIH 
YKCKE CGKVF IRSKS LLLHQRVH 
YECDECGKCFI LKKSLI GHQRIH 
YECNECGKVF I LKKS L I LHQRFH 
YKCNKCQKAF I LKKS L I LHQR I H 
YAC AE CD KAF SRSFSLILHQRTH 
YGCHECGKTFGRRFSLVLHQRTH 
YACAQCGKTFNNTSNLRTHQRIH 
YKCDMCCKHFNKI SHLINHRRIH 
FKCD I CGKI FNKKSNL AS HQRIH 
HQCEDCEKIFRWRSHLI IHQRIH 
HKCDDCGKVLTSRSHLIRHQRIH 
HECKDCNKTFS YLSFLI EHQRTH 
HKCSDCGKAF S WKSHL I EHQRTH 
HKC S D CGKAF S WKSHL I EHQRTH 
YKCND CGKVF SYRSNL IAHQRI H 
YGCDDCGKAF S QHS HL I EHQR I H 
YTCDQCGKGFGQSSHLMEHQRIH 
YNCTACEKAFIYKNKLVEHQRIH 
YKCDVCEKAFIQRTSLTEHQRIH 
YKCDQCGKGFIEGPSLTQHQRIH 
YKCDKCGKAFTQRS VLTEHQfL I H 
YKCEECGKAFKQLSTLTTHKRIH 
YKCKECGKAFKQFSTLTTHKI IH 
YKCKE CD KTFKRLSTLTKHKI IH 
YKCKE CDKTFKRLS TLTKHKI IH 
YKCEKCGKAFNHFSHLTTHKI IH 
YKCEECGKAFNRFSTLTTHKI IH 
YKCEE CGKAFNQF S TLTKHKI IH 
YTCEECGKVFNWS SRLTTHKRIH 
YKCEECGKAFNKS S I LTTHKI I R 
YKWE KFGKAFNRS SHLTTDKI TH 
YKCEEGGKAFNWSSTLTYYKSAH 
YKCEECGKAFNQSSNLTTHKIIH 
YKCEE CGKAFNR S S KLTTHKI I H 
YKCEECGKAFNQSSTLTTHNIIH 
YKCEECGKAFNHSSSLSTHKIIH 
YKCEECGKAFKLSSTLSTHKI IH 
YKCEECGKAFSQSSTLTTHKI IH 
YKCEECGKAFNQSSTLTTHKRIH 
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O95 7 8 0_HUMAN 


473 


YKCEECGKAFNS SSI LTEHKVI H 


095779_HUMAN 


474 


YKCEECGKAFNS SSI LTEHKYIH 


ZN 9 INHUMAN 


475 


YKCKECGKAFKHSSALAKHKIIH 


ZN8 5__HUMAN 


476 


YKCKECGKAFKHSSTLTKHKI IH 


ZN8 5_HUMAN 


477 


YKCEECDKAFKWSSVLTKHKIIH 


ZN43JHUMAN 


478 


YKCEECGKAFKWSSTLTKHKIIH 


ZN85_HUMAN 


479 


YKCEECGKGFKWPSTLTIHKI IH 


ZN91_HUMAN 


480 


YKCGECGKAFKE S SALTKHKI I H 


ZN9 1 HUMAN 


481 


YKCEECGKAFRKS S TLTEHKI IH 


ZN9 1 HUMAN 


482 


YKCEECGKAFRQSSTLTKHKI IH 


Q02313_HUMAN 


483 


YKCGECGKAFNQ S S ALNTHKI IH 


ZN91_HUMAN 


484 


CKCKECEKTFHWSSTLTNHKEIH 


Q7543 7_HUMAN 


485 


YKCKECGKTFNWSSTLTNHRKIY 


ZN9 1 HUMAN 


486 


YKCKECGKAFSNSSTLANHKITH 


ZN9 1 HUMAN 


487 


YKCKE CGKAF SNS S TLANHKI TH 


043 345_HUMAN 


488 


YKCKECGKTFIKVSTLTTHKAIH 


043345_HUMAN 


489 


YKCEECGKTFSKVSTLTTHKAIH 


043345_HUMAN 


490 


YKCEECGKTFSKVSTLTTHKAIH 


043345_HUMAN 


491 


YKCEECGKAF S KVS TLTTHKAI H 


043345__HUMAN 


492 


YKCKECGKAFSKVSTLITHKAIH 


O95 2 7 0_HUMAN 


493 


YACRMCGKAFKRSSTLSTHLLIH 


GF 1 1_HUMAN 


494 


YDCKI CGKS FKRS STLS THLLIH 


07 534 6_HUMAN 


495 


YKC 1 1 CGKAFKRS STLTTHKKI H 


ZN4 3_HUMAN 


496 


YKCKECGKAFNQYSNLTTHNKIH 


ZN85_HUMAN 


497 


YKCKE CGKAFNRSSTLTTHRKIH 


ZN9 1 HUMAN 


498 


YKCSEECDKAFIWSSTLTEHKRIH 


ZN9 INHUMAN 


499 


YKCEECGKAFISSSTLNGHKRIH 


ZN4 3_HUMAN 


500 


YKC EEC GKAFN Y S S HLNTHKR I H 


O95780_HUMAN 


501 


YKCEE CGKAFNWS S I LTEHKRIH 


0957 7 9J3UMAN 


502 


YKCEECGKAFNWSSILTEHKRIH 


043345_HUMAN 


503 


YKCEECGKAFNWSSNLMEHKRIH 


043 345_HUMAN 


504 


YKCEECGKAFNWSSNLMEHKRIH 


04 3 34 5_HUMAN 


505 


YKCEECGKAFNWSSNLMEHKKIH 


043 3 45_HUMAN 


506 


YKCEECGKAFNWSSNLMEHKKIH 


ZN 9 INHUMAN 


507 


FKCKECGKAFIWSSTLTRHKRIH 


ZN9 INHUMAN 


508 


F KCKE CGKGF I WS S TL TRHKR I H 


ZN9 INHUMAN 


509 


YKCEECGKAFLWSSTLRRHKRIH 


ZN9 INHUMAN 


510 


YKCEECGKAFLWSSTLTRHKRIH 


Q02313_HUMAN 


511 


YKCE AYGRAFNW S S TLNKHKR I H 


ZN 9 INHUMAN 


512 


YKFEECGKAFRQSLTLNKHKI IH 


Z14 INHUMAN 


513 


YKCEECGKAFRRSTDRSQHKKIH 


0753 4 6_HUMAN 


514 


YKCEECGKAFNWSSDLNKHKKIH 


ZN9 1_HUMAN 


515 


YKCEE CGKAFNWS SSLTKHKRIH 


ZN9 INHUMAN 


516 


YKCEECGKAFNWSSSLTKHKRFH 


ZN8 5_HUMAN 


517 


YKCEECGKAFNWSSTLTKHKRIH 


ZN4 3_HUMAN 


518 


YKCEECGKAFNWPSTLTKHNRIH 


ZN43_HUMAN 


519 


YKCEECGKAFNWPSTLTKHKRIH 


075437 HUMAN 


520 


YKCEECGKAFFWSSTLTKHKRIH 



WO 02/099084 



PCT7US02/22272 



O95780_HUMAN 521 

0957 7 9_HUMAN 522 

ZN43__HUMAN 52 3 

ZN43_HUMAN 524 

ZN9 1 HUMAN 52 5 

075437_HUMAN 526 

Z 1 4 1 HUMAN 527 

Z141_HUMAN 528 

ZN43__HUMAN 52 9 

043345_HUMAN 530 

043345_HUMAN 531 

043 3 4 5_HUMAN 532 

043345_HUMAN 533 

043 34 5_HUMAN 534 

043345_HUMAN 535 

0433 4 5_HUMAN 53 6 

043345_HUMAN 537 

043 34 5_HUMAN 538 

043345_HUMAN 539 

043345_HUMAN 540 

Z 1 9 5_HUMAN 541 

O95780__HUMAN 542 

095779_HUMAN 543 

O95780_HUMAN 544 

095779__HUMAN 545 

ZN43__HUMAN 54 6 

ZN91__HUMAN 54 7 

Q02313__HUMAN 548 

ZN8 5_HUMAN 54 9 

ZN43__HUMAN 550 

ZN85_HUMAN 551 

ZN85_HUMAN 552 

043 34 5_HUMAN 553 

043345__HUMAN 554 

043345J3UMAN 555 

ZN43_HUMAN 556 

ZN4 3_HUMAN 55 7 

075346_HUMAN 558 

ZN85_HUMAN 55 9 

Z14 INHUMAN 560 

Z 1 4 1 HUMAN 5 61 

Q02 313_HUMAN 562 

ZN43_HUMAN 5 63 

ZN43_HUMAN 5 64 

ZN85_HUMAN 5 65 

Q02313_HUMAN 566 

ZN8 5_HUMAN 567 

095780 HUMAN 568 



82 

YKCEECGKAFNWCSSLTKHKRIH 
YKCEE CGKAFNWC S S LTKHKRI H 
YKCEECGKAFSRSSNLTKHKKIH 
YKCTECGEAFSRS SNLTKHKKIH 
YKCEECGKAFSRSSTLTKHKTIH 
YKCEECGKAFNRS STFTKHKVIH 
YKCEECGKAFNRFTTLTKHKRIH 
YKCEECGKAFNRS TTLTKHKR I H 
CKCEKCGKAFNCPS I I TKHKRIN 
YKCEACGKAYNTFSILTKHKVIH 
YKCEECGKAFSTFSILTKHKVIH 
YKCEECGKSFSTFSILTKHKVIH 
YKCEECGKSFSTFSVLTKHKVIH 
YKCEECGKGFVMFSILAKHKVIH 
YKCEECGKGFSMFS ILTKHEVIH 
YKCEECGKGFSMFSILTKHEVIH 
YKCKECGKAFSKFSILTKHKVIH 
YKCKECGKAFSKFSILTKHKVIH 
YKCKE CGKAF SKFSI LTKHKVI H 
YRCKECGKAFSKFSILTKHKVIH 
FKCEECDSIFKWFSDLTKHKRIH 
YKCEKCDKVFKRFSYLTKHKRIH 
YKCEKCDKVFKRFSYLTKHKRIH 
CI CEECGKTFKWFS YLTKHKRIH 
CICEECGKTFKWFSYLTKHKRIH 
YKCEECGKAFNHFSILTKHKRIH 
YKCEKCCKAFNQSSILTNHKKIH 
YKCEKCVRAFNQAS KLTEHKL I H 
YKSKECEKAFNQSSKLTEHKKIH 
YKCKECAKAFNQSSNLTEHKKIH 
YKCEECGKAFNQ S S KLTKHKKI H 
YKCEECGKAFNQ S SNL I KHKKIH 
YKCEECGKAFNRSAILIKHKRIH 
YKCEECGKAFNQSAILIKHKRIH 
YKCEECGICAFNQSAILTKHKI IH 
YKCEVCGKAFNQFSNLTTHKRIH 
YT C E E C GKAFNQ F SNL TTHKR I H 
YRCEECGKAFNQSANLTTHKRIH 
YTCEECGKAFNQS SNLTKHKRIH 
YKCKDCDKAFKRF SHLNKHKKI H 
YKCKECDKAFKQFSLLSQHKKIH 
YKCEECGKAFKQFSNLTDHKKI H 
YKCEECGKAFTQSSNLTTHKKIH 
YKCEECGKAFTQSSNLTTHKKIH 
YKCEECGKAFKQS SNLTTHKI IH 
YKCEECGKAFNQLSNLTRHKVIH 
YECEKCGKAFNQSSNLTRHKKSH 
YNCEECGKAFNRCSHLTRHKKIH 



WO 02/099084 



PCT/US02/22272 



83 



095779_HUMAN 


569 


YNCEECGKAFNRCSHLTRHKKIH 


O95 7 8 0_HUMAN 


570 


YTCEDCGRAFNRHSHLTKHKT I H 


095779_HUMAN 


571 


YTCEDCGRAFNRHSHLTKHKTIH 


Q02313_HUMAN 


572 


YE CE E CGKAFNRS S KLTEHKY I H 


ZN 9 INHUMAN 


573 


YKCE E CGKAFNRS SNLTI HKF I H 


ZN91_HUMAN 


574 


YKCEECGKAFNRS SNLT I HKF I H 


ZN4 3_HUMAN 


575 


YKCEKCGKAFNRPSNLIEHKKIH 


Z 1 4 1 HUMAN 


576 


YTCEECRKIFTSSSNFAKHKRIH 


Z 1 4 1 HUMAN 


577 


FTCEECGS I FTTS SHFAKHKI IH 


Z14 INHUMAN 


578 


YTCEECGKAFKWSLIFNEHKRIH 


Z14 INHUMAN 


579 

* 


YTCEECGKAFRQSSKLNEHKKVH 


043345_HUMAN 


580 


YKCEECGKAYKWSSTLSYHKKIH 


043 3 4 5_HUMAN 


581 


YKCEECGKAYKWSSTLSYHKKIH 


043345_HUMAN 


582 


YKCEECGKAYKWPSTLSYHKKIH 


043345_HUMAN 


583 


YKCEECGKAYKWPSTLSYHKKIH 


04 3 3 4 5_HUMAN 


584 


YKCEECGKAYKWPSTLRYHKKIH 


04334 5_HUMAN 


r~\ i~~ 

585 


YKCEECGKGFSWSSTLSYHKKIH 


04 3 3 4 5_HUMAN 


586 


YKCEECGKAFSWLSVFSKHKKIH 


04334 5_HUMAN 


r** f~\ m 

587 


YKCEECGKAFSWLSVFSKHKKTH 


09 5 7 8 0__HUMAN 


588 


YKCEECGKAFHWCSPFVRHKKIH 


0957 7 9_HUMAN 


589 


YKCEECGKAFHWCSPFVRHKKIH 


Z195_HUMAN 


590 


YTCEECGNIFKQLSDLTKHKKTH 


Z19 5_HUMAN 


591 


YKCEECGRAFMWFSDITKHKQTH 


043345_HUMAN 


592 


YKCEECGKAFSWPSRLTEHKATH 


043345_HUMAN 


593 


YKCEECDKAFSWPSSLTEHKATH 


ZN4 3_HUMAN 


594 


YKCEECGKAFKWS S KLTEHKI TH 


ZN43_HUMAN 


595 


YKCEECGKAFKWSSKLTEHKLTH 


ZN91_HUMAN 


596 


YKCEECGKAFSHSSALAKHKRIH 


ZN91_HUMAN 


597 


YKCEECGKAFSHSSALAKHKRIH 


ZN9 INHUMAN 


598 


YKCEECGKAFSHSSTLAKHKRIH 


ZN91_HUMAN 


599 


YKCEECGKAFSQPSHLTTHKRMH 


ZN 9 INHUMAN 


600 


YKCEE CGKAFSQ S STLTRHKRLH 


ZN9 1 HUMAN 


601 


YKCEECGKAFSQSSTLTRHTRMH 


Z124_HUMAN 


602 


YECMECGKALGFSRSLNRHKRIH 


Z141_HUMAN 


603 


YKCDECGKAFGRSRVLNEHKKIH 


ZN74_HUMAN 


604 


YKCDECGKAFTWSTNLLEHRRIH 


Z195_HUMAN 


605 


YKCDECGKAYTQSSHLSEHRRIH 


Z195_HUMAN 


606 


YKCDECGKNFTQSSNLIVHKRIH 


Z195_HUMAN 


607 


YKCDECGKNFTQSSNLIVHKRIH 


ZN8 0_HUMAN 


608 


YKCKECGSVFNKNSLLVRHQQIH 


Z165_HUMAN 


609 


FGCKECGRAFNLNSHLIRHQRIH 


Q02313JE1UMAN 


610 


YKCKECGKAFNQTSHLIRHKRIH 


O6 0 792_HUMAN 


611 


YKCNE CGRAFNQN IHLTQHKRIH 


ZN74_HUMAN 


612 


YRCGE CGKAFNQRTHLTRHHR I H 


Q15776J3UMAN 


613 


YKCKECGKAFNGNTGLIQHLRIH 


O43309_HUMAN 


614 


YKCDECGNAFRGITSLIQHQRIH 


O43309_HUMAN 


615 


YKCEECGKAFRGRTVL I RHKI IH 


075123 HUMAN 


616 


YVCNECGKRFSQTSNFTQHQRIH 



WO 02/099084 



PCT/US02/22272 



84 



O60792_HUMAN 


617 


YKCNE C GKAFNGP STFI RHHM I H 


Q43 2 96_HUMAN 


618 


FVC S E CGKAFTHC STFI LHKRAH 


Q43 3 3 7__HUMAN 


619 


YECS QCRKAFTHRS TF 1 RHNRTH 


043296_HUMAN 


620 


YKCNECGKAFTHRSNFVLHNRRH 


OZF_HUMAN 


621 


YGCNECGKAFSQFSTLALHLRIH 


ZN83_HUMAN 


622 


YKCNERGKAFHQGLHL P I HQ 1 1 H 


ZN0 7_HUMAN 


623 


YKCNE CGKAF S QNS TLFQHQ I IH 


ZN8 3_HUMAN 


624 


YKCNE CGKVF S RNS YL AQHL I IH 


ZN8 3_HUMAN 


625 


YECNKCGKVFSRNSYLVQHLI IH 


ZN83_HUMAN 


626 


YKCNECGKVFGLNS SLAHHRKIH 


ZN8 3_HUMAN 


627 


YKCNE CGKYFHQ I SHLAQHRT I H 


ZN8 3_HUMAN 


628 


YKCNE CGKVFHNMS HLAQHRR I H 


ZN8 3_HUMAN 


629 


YKCNECGKVFNQI SHLAQHQRIH 


ZN83_HUMAN 


630 


YRCNVCGKVFHHI SHLAQHQRIH 


ZN8 3_HUMAN 


631 


YKCDE CGKVF S QNS YL AYHWR I H 


Z18 9_HUMAN 


632 


YKCDECGKTFSVSAHLVQHQRIH 


O75802JHUMAN 


633 


YKCDECGKTFSVSAHLVQHQRIH 


ZN83_HUMAN 


634 


YKCDECDKAF S QNSHL VQHHR I H 


O6 0 7 92_HUMAN 


635 


YKCDECGKAFSQRTHLVQHQRIH 


043 3 6 INHUMAN 


636 


YECGESSKVFKYNSSLIKHQIIH 


ZN8 3 _HUMAN 


637 


FKCNECGKAFSMRSSLTNHHAIH 


O60792_HUMAN 


638 


YKCNECGKAFSYCSSLTQHRRIH 


Z13 7_HUMAN 


639 


YKYHD C GKVF SQASS YAKHRR I H 


O14 7 0 9_HUMAN 


640 


YKCED CGKAF S YNS S LLVHRR I H 


Z12 4_HUMAN 


641 


YVCMECGKAFSCLSSLQGHIKAH 


O6 0 7 92_HUMAN 


642 


YQCHECGKTFSYGSSLIQHRKIH 


O60792_HUMAN 


643 


YDCAECGKSFSYWSSLAQHLKIH 


ZN8 3_HUMAN 


644 


YKCNE CGKVF SHKSS LVNHWR I H 


ZN83_HUMAN 


645 


Y KCNE C GKVF S HKS S L VNHWR I H 


Z13 2_HUMAN 


646 


YKCSECGKFFSRKSSLI CH WRVH 


04333 9JHUMAN 


647 


YKCNE C GKF F S QT S HLNDHRR I H 


043 33 8_HUMAN 


648 


YECSECGKSFSQTSHLNDHRRIH 


ZN4 5_HUMAN 


649 


YKCNACGKS FSYSS HLN I HCR I H 


ZN4 5_HUMAN 


650 


YKCGTCGKGFSRSSDLNVHCRIH 


Z2 63_HUMAN 


651 


YKCPLCGKNFSNNSNLIRHQRIH 


Z2 02_HUMAN 


652 


YTCPTCGKSFSRGYHLIRHQRTH 


O75 8 5 0_HUMAN 


653 


FSCPQCGKSFSRKTHLVRHQLIH 


Z2 0 5_HUMAN 


654 


YACPLCGKSFSRRSNLHRHEKIH 


01553 5_HUMAN 


655 


HQ C I E C GKS FNRH CNL I RHQ KI H 


ZN2 4_HUMAN 


656 


YECVQCGKSYSQSSNLFRHQRRH 


Z191_HUiy[AN 


657 


YE C VQ C GKS YSQS SNL FRHQRRH 


Q99592_HUMAN 


658 


YTCTQCGKSFQYSHNLSRHAWH 


Q13 3 97_HU3YIAN 


659 


YTCTQCGKSFQYSHNLSRHAWH 


Z18 9_HUMAN 


660 


YLCRQCGKSFSQLCNLIRHQGVH 


O7 5 8 02_HUMAN 


661 


YLCRQ CGKS F S QLCNL I RHQGVH 


Z18 9_HUMAN 


662 


YQCKECGKSFSQLCNLTRHQRIH 


O75802_HUMAN 


663 


YQCKECGKS F S QL CNL TRHQR I H 


Z2 63 HUMAN 


664 


YKCTLCGENF SHRSNL I RHQR I H 



WO 02/099084 



PCT/US02/22272 



85 



Z2 63_JHUMAN 


**** l~~ 

665 


YKCPECGE I FAHS SNLLRHQRI H 


j«**V ,***% ► ■ | « _^*t ^ -f* *T* 1 i TV *~*\ TL T 

09 5 8 7 8__HUMAN 


mmm ~_ _p*i 

666 


YKC S E CGKS FSRS SNR I RHER I H 


Z2 63_HUMAN 


667 


YTCHECGDSFSHSSNRIRHLRTH 


04333 6_HUMAN 


668 


YVC 1 1 CGKS F I RS SD YMRHQRI H 


04333 6_HUMAN 


669 


YVCME CGKS F I H S YDR I RHQRVH 


— . j^^j ■ ■ i ■ pp. tap •*»» 

BCL6_HUMAN 


***** I ' 1 V. 

670 


YRCNI CGAQFNRPANLKTHTRIH 


Z13 3_HUMAN 


671 


■V ^ ^ ^ mm—mJ m*~mmm mmmm— _mmmmA -mmm, -«pj *PJtaM| ^mmmm m— taPtat JP> ^1 *T "T" T* 1 S *™ — t* *l" 

YKCGECGLSFSKMTNLLSHQRIH 


_ mm • hMPI — » «**.^ta -^m—mm Jta PJu pm IP 

ZN75_HUMAN 


672 


YRCS WCGKS F SHNTNLHTHQRI H 


O6 0 8 93_HUMAN 


673 


■w> *PP PPP « jm~-m-mm mmm i ■ ■ . ^— ■ 1 f ■ » > < | 1 111 Ip. -PL — T* • 1 I**, m^~H »T* ■ ' 1 1 T *** m*****m. "f* ■ | ■ mmm _ 

YKCNECERSFTRNRSLIEHQKIH 


mmm ■ **. pw* pb^pj j* «*■* —•.■mmm ir* P> ^f— 

ZN74__HUMAN 


674 


YKC S E CGRAF S QNHCL I KHQ K I H 


mmm\ Jt III 1 ^**. w T ^T* 1 | ■ tap ****\ Pf. **p 

O14 7 0 9_HUMAN 


675 


Y AC SEC GKGF T YNRNL I E HQR I H 


— s« _-* ■ i | 1 -7- -y- -y am —m\ -» *x* 

Z17 7_HUMAN 


* ■— ■ -p— 

676 


YKCFQCEKAFSTSTNLIMHKRIH 


060 7 92_HUMAN 


677 


YKCNECEKAFSRSENLINHQRIH 


0 9 4 8 9 2 _HUMAN 


678 


YGCTLCAKVFSRKSRLNEHQRIH 


Z18 9_HUMAN 


679 


YHCTKCKKSFSRNSLLVEHQRIH 


O75 8 02_HUMAN 


680 


YHCTKCKKSFSRNSLLVEHQRIH 


04 3 3 0 9_HUMAN 


681 


YQC TQCNKS F S RRS I LTQHQGVH 


0 1 5 5 3 5_HUMAN 


682 


YQCSQCSKSYSRRSFLIEHQRSH 


■ ■ j tap* P»t f « ■" p. mm • ■ 1 — * 1 ■ 

Z2 0 5_HUMAN 


683 


YTCPACRKSFSHHSTLIQHQRIH 


I'll _h T'tt JF -» T* 

Z18 9_HUMAN 


684 


YTC I E CGKS F S R S S FL I EHQR I H 


m» piv ^ta> *y* *f mw^mL mm- m mm mm mmm 

O75 8 02_HUMAN 


685 


YTC I ECGKS FS RS S FL I EHQRIH 


Z18 9_HUMAN 


a y- 

686 


■ ■ >->J-« tap. ip ■ ^ — -j s—-4 m— mm m—m. p i **— ^ ^pta- ^— < T*1T T — j— T'lT *T" /r r * m *m. 1 \ *T" T T 

FQCNECGKSFSRSSFVIEHQRIH 


075 8 02_HUMAN 


687 


FQCNECGKSFSRSSFVIEHQRIH 


feu mmmm ^pta> ^k«p ^v*taL v ata ^pa tap* 

Z18 9_HUMAN 


mt wmm i*mmm 

688 


YLCTVCGKS FSRSSFLI EHQRIH 


0758 02_HUMAN 


689 


YLCTVCGKSFSRSSFLIEHQRIH 


— ■ 1 — ^ _M ■_ w* PFta ta* p»j* ^pl mmm 

014 7 09_HUMAN 


690 


YECHVCRKVLTSSRNLMVHQRIH 


014 7 0 9_HUMAN 


691 


T'l 1 y^*H *i s *T r S**4 TT" 1 TT /""I I**"! Ill TTT^\ TV TT T T/ - "i T T y^N. TT\ T T T 

YE CDKCRKS FT S KRNLVGHQR I H 


ZN3 5_HUMAN 


692 


-_■ _* - ■ -- m -m—mm\ -m m—m ■ -i m mi mtmnrnj tar* mmm | p ■ I p fm mmm. . m m, j-—m m-m-mrn mm. — mm m rmm -m- -r- mmm '1 ■ ^ ■ f ■ -*- 

YECNECGKTFTRSSNLIVHQRIH 


075123_HUMAN 


693 


mmm mmm ^bP^j b mmm p*^^ ■ ^^1 -p_ ■ j # T Mill *Hp ^p-hk. jP** mmmmm mmmmm mmmmm pv ^pa^w *T* «WP ^P" jpi^k ^mmm mmm mmm 

YECNECGKSFIRSSSLIRHYQIH 


j p. jm ammm mmmm. — ^ _^mm mmm H 'tap-~> Jpi Ptak ^PL 

04329 6_HUMAN 


694 


if* yi> ■ i ^*%jsp* tataiin ■ y*o«pt yp *r— 1 y*"NT "T /*"T TTTK TT *T" '1 \ T T TV T* *T" T T 

YECVECGKSFCWSTNLIRHAI IH 


fl ^mmmm mmmmm mmwm -mmWm mmm mmm mmm taptak Mm Hh taL ataa 

043296_HUMAN 


695 


mmm mmm im l ■ _m—mf mm-mm ipipi ml— 4 *P* ■ ■■ -f 1 1 "T* I f y^l "1\ »1\ — ^ —J" T T ~T" T" ^ *JP'*T* *T" "T" "f T 

YECSECGKVFLESAATtTHHYVIH 


04 3 3 3 7_HUMAN 


696 


YECTQCGKAFHRSTYLIQHSVIH 


»j ^ jm mmmm -P~ *- ^ » ^mmm mmm mmm mmm mmmmm. am tapp. Pta, Vtapj 

04329 6_HUMAN 


697 


■*» T m t mm - w 1 **~*4 IIP) P ^P-^ ^mmmmM ^PU _ PP» 1 1 | *^P«V ■ | • tat «p> JPPtaJ f^f^ V _ mmm mmm* ^mmm. mm. mmm tar* mm* mm mf *mfm mmm 

YECTECGKTFIKSTHLLQHHMIH 


, ■ ■ PH ^m~* mmmmm. ^m± mmm mmmmr* ■ mm mm mm mmm mm mmm* 

0752 90_HUMAJST 


698 


mmm rp* ip. p mmm-mM mm. mmm mm, -mm mm-mt mH'mt T 1 1 p**^l ■ 1 "TV **. *r*T* — T— , y *t* f* ^— W 1 " | ' *r* *p* 

YECKECGKYFSRSANLIQHQSIH 


Z2 0 5_HUMAN 


699 


•V mmm mmm, |J ■ I 1 p ******* m*~~*t V * * 'f | »^*-|J If > jm—mt J mm mmm mmm tap* *»■ -mm-mm -mm mmm ^-mm. m, f—M * ta I - _ 

YACTDCGKRFGRS SHL I QHQ 1 1 H 


Z165_HUMAN 


700 


YE C S E C GKT FR VS S HL I RHFR I H 


^mmm a< ■!■* |ppita ^■^p* _^pl* ^P" tak^tal mm— mmm mm mmm mmm mmm 

Q15 77 6_HUMAJST 


701 


HP* mmm Mm I p m pj ^p— ^ Ip I P ^ O m mm mmm mmm it *w i p l ' ■ w if *. m m\ f ' m) -mm p_p 1 m y~~~~t ~y mmm *V | H •** ■«* 

YECDECGKTFRRS SHL I GHQRSH 


Q15 7 76_HUMAN 


702 


V< tapl IP ■ > |J _ tap* !■ mmm ■ m-— -mf mm***.- T* ** P« 1 * 1 jf— u «P- -|- jl mm J—^~* —\— mmm M 1 f J PlpJ *M mmm . 1 y *T* PPP ttaP 

YECNECGKAFSHSSHLIGHQRIH 


. mmt* jm-m. -m—m. mmm mmm tap* ^■■■P, mm ^mm mmm * ■ > 

Z189 HUMAN 

- « 


703 


V V ^P~ ^ ^*^J *p\ «Pp**PF ^P> jpitapJ f-**-^ mmm mm ■ ■ ■ *VH J* - tap* tan ^*pp mf-*-* mmmmmmm mmm —Pta v T™X "T" -T -T* 

YECNYCGKTFSVSSTLIRHQRIH 


P"^ mm^mm mmmmm ^mm ^mmm* mmm Mp «ta* «HtaP jp tapp, ** pp» 

07 5 8 0 2_HUMAN 


704 


•mf mmm m P pj mm PPP> mmm mm mi m**~mJ mmm — • — 1 1 1 -p»ta* *** IP J *j yPtta* ta*_— mm—m ^mmm Ii 1. -mm _ mmm- taftapw M IV mmm 

YE CNYCGKT F S VS S TL I RHQR I H 


043337_HUMAN 


705 


YECNACGKAFSQSSTLIRHYLIH 


ZN0 7_HUMAN 


706 


YECSECGKAFSRSSYLIEHQRIH 


Z132__HUMAN 


707 


YECS ECGKAFAHS S TL I EHWRVH 


O43 3 4 0_HUMAN 


708 


YE C S E CGKAF S CN I YL I HHQRFH 


Z13 5_HUMAN 


709 


YECGECGKAFSQSTLLTEHRRIH 


043338_HUMAN 


710 


YECGECGKS FSQ S SNL I EHCRIH 


043 3 3 8_HUMAN 


711 


YE CGKCGKS FTQH S GL I LHRKSH 


Z14 0 HUMAN 


712 


YECDECGKVFTWHASLIQHTKSH 



WO 02/099084 



PCT/US02/22272 



86 



Q13 3 9 8_HUMAN 


713 


YAC PE CGKS FSQIYS LNSHRKVH 


Q13 3 9 8_HUMAN 


714 


YE C S KCGKS FKQSSSFSS HRKVH 


O4334 0_HUMAN 


715 


YECSECGKSFSHSTNLFRHWRVH 


O43340_HUMAN 


716 


YECSECGKSFSHSTNLYRHRSAH 


O4334 0_HUMAN 


717 


YECSECGKSFSQSS GLLRHRRVH 


04 3 3 4 0_HUMAN 


718 


YKC S E CGKS FSQSS GFLRHRKAH 


043 3 4 0__HUMAN 


719 


YECSEC GKVF SQSB GLFRHRRAH 


O43 3 4 0_HUMAN 


720 


YECDECGKSYSQSSALLQHRRVH 


Q13 3 9 8_HUMAN 


721 


YECSECGKSFSQSS SLIQHRRVH 


Q13398_HUMAN 


722 


YE CGE CGKS F S QRSNLMQHRRVH 


Z13 2_HUMAN 


723 


YECSECRKSFSRSSSLIQHWRIH 


Z132_HUMAN 


724 


YECSQCGKSFSRSSATiTQHWRVH 


Q133 98_HUMAN 


725 


HE CNE C GKS FSRSSSLI HHRRLH 


043 3 3 9_HUMAN 


726 


YKCGECGNSFSQSAILNQHRRIH 


043 3 3 9_HUMAN 


727 


YKCGDCGKS FSQSSILI QHRR I H 


O6 0 7 65_HUMAN 


728 


YRCEECGISFGQSSALIQHRRIH 


043 3 3 8_HUMAN 


729 


YECGQCGKSFSLKCGLIQHQLIH 


043 3 3 9_HUMAN 


730 


YECGQCGKSFSQKSGLIQHQWH 


043 3 3 8_HUMAN 


731 


YD CGQ CGKS FIQKSSLI QHQWH 


Q13 3 98__HUMAN 


732 


YQCSQCGKSFGCKSVLIQHQRVH 


O43 3 4 0_HUMAN 


733 


YVC S E CGKS FGQKSVL I QHQRVH 


O43 3 4 0_HUMAN 


734 


YDCSE CGKS FRQVSVL I QHQRVH 


Q13 3 98_HUMAN 


735 


YECSECSKSFSCKSNLIKHLRVH 


043 3 3 9_HUMAN 


736 


YECGQCGKSFSQKATLIKHQRVH 


0433 3 8J3UMAN 


737 


YVCGQCGKS FSQRATL I KHHRVH 


043339_HUMAN 


738 


YE C S Q CGKS F S QKATLVKHQRVH 


Q13 3 9 8_HUMAN 


739 


YECSECGKSFSQNFSLIYHQRVH 


04 3 3 4 0_HUMAN 


740 


YE C S VCGKS F I RKTHL I RHQTVH 


O43 3 4 0_HUMAN 


741 


YECSECEKSFSCKTDLIRHQTVH 


O43 3 4 0__HUMAN 


742 


YE CRE CGKS FTRKNHL I QHKTVH 


Z18 9JHUMAN 


743 


HKCEE CGKGFVRKAHF I QHQRVH 


O75802_HUMAN 


744 


HKCEECGKGFVRKAHF I QHQRVH 


O43340_HUMAN 


745 


HECSECGKSFSRKTHLTQHQRVH 


O433 09JOTMAN 


746 


YQCKECGKSFSQSGLIQHQRIH 


Q15776_HUMAN 


747 


YQCNQCGKAFSQSAGLILHQRIH 


015535_HUMAN 


748 


YHCKE CGKVF S Q SAGL I QHQR I H 


O6 0 7 92_HUMAN 


749 


YNCNE CRKTFS Q S T YL I QHQR I H 


Q15 7 7 6_HUMAN 


750 


YHCKE CGKAF S QNTGL I LHQR I H 


ZN84_HUMAN 


751 


YGCMECGRAFSEKSNLINHQRIH 


Q15 7 7 6_HUMAN 


752 


YKCNE CGRAF S QKS GL I EHQRIH 


Z189_HUMAN 


753 


HKCDE CGKAF S RNS GL I QHQR I H 


O75 8 02_HUMAN 


754 


HKCDE CGKAF S RN S GL I QHQR I H 


Z189_HUMAJSF 


755 


HKCEECGKAFSRSSGLIQHQRIH 


O75 8 02_HUMAN 


756 


HKCEE CGKAF S RS SGL I QHQR I H 


ZN24_HUMAN 


757 


YKCLE CGKAF S QN S GL I NHQR I H 


Z191_HUMAN 


758 


YKCLE CGKAF SQNSGLI NHQRI H 


OZF__HUMAN 


759 


YQCSECGKAFSQKSHHIRHQKIH 


Q15776 HUMAN 


760 


YQCNECGKAFIQRSSLIRHQRIH 



WO 02/099084 



PCT/US02/22272 



ZN3 5_HUMAN 761 

ZN0 7_HUMAN 762 

O60765_HUMAN 763 

OZF_HUMAN 764 

OZF_HUMAN 765 

Q92951_HUMAN 766 

OZF_HUMAN 76 7 

ZN74_HUMAN 768 

ZN74_HUMAN 7 69 

O60765_HUMAN 770 

ZN3 5_HUMAN 771 

ZN3 5_HUMAN 772 

ZN3 5_HUMAN 773 

O14709_HUMAN 774 

O14709_HUMAN 775 

O14 7 0 9_HUMAN 776 

O14709_HUMAN 777 

O14709JEUMAN 778 

Z157_HUMAN 779 

O60765JKUMAN 780 

EVI 1 HUMAN 781 

Q15776_HUMAN 782 

O43309_HUMAN 783 

Z2 0 0_HUMAN 784 

015361_HUMAN 785 

ZN0 7_HUMAN 78 6 

ZN74_HUMAN 78 7 

ZN3 5_HUMAN 78 8 

Z14 0_HUMAN 789 

O60893_HUMAN 790 

Q13 3 9 6_HUMAN 791 

043361_HUMAN 792 

04 3 3 6 1_HUMAN 793 

075123JHUMAN 794 

O752 90_HUMAN 795 

O75290_HUMAN 796 

O752 9 0_HUMAN 797 

O75290_HUMAN 798 

O75290_HUMAN 799 

O75290_HUMAN 800 

Z14 0_HUMAN 801 

043 2 96_HUMAJST 802 

043296_HUMAN 803 

014913_HUMAN 804 

EVI 1_HUMAN 8 0 5 

015535_HUMAN 806 

Z13 2_HUMAN 80 7 

Z132 HUMAN 808 



87 

YDCSECGKAFSQLSSLIVHQRIH 
YRCEECGKAFGQSSSLIHHQRIH 
FKCNTCGKTFRQSSSRIAHQRIH 
FKC SECGTAFGQKKYL I KHQNIH 
F E CNE C GKAF S Q KQ Y V I KHQNTH 
FECTHCGKS FRAKGNL VTHQR I H 
FECNECGKS F S QKENLLTHQKI H 
FKCNE CGKAFS S HA YL I VHRR I H 
F KC AD C GKGF S CHAYLL VHRR I H 
FKCSECGRAFSQSASLIQHERIH 
FECHECGKAFIQSANLWHQRIH 
FTCSVCGKGFSQSANLWHQRIH 
FACNDGGKAFTQSANLIVHQRSH 
YKCNE CGKDFS QNKNLWHQRMH 
YKCDECGKTFAQTTYL IDHQRLH 
YKCNE CGKVF S QNAYL I DHQRLH 
YKCTECGKAFTQSAYLFDHQRLH 
YKCNE CGKAF S Q S AYLLNHQR I H 
YQ CNE C GKS FRVH SSLGI HQR I H 
YNCNECGKALSSHSTLIIHERIH 
YKCDQC PKAFNWKSNL I RHQMSH 
YQCNVCGKAFSYRSALLSHQDIH 
YE CNE CGKAF VYNS S LVS HQEIH 
YGCKKCGRRFGRLSNCTRHEKTH 
YGCKKCGRRFGRLSNCTRHEKTH 
YKCND CGKAFNR S S RLTQHQKI H 
YQCGSCGKAFTCHSSLTVHEKIH 
YVCSKCGKAFTQSSNLTVHQKIH 
YECIECGKAFRRFSHLTRHQSIH 
YQCNMCGKAFRRNSHLLRHQRIH 
YSCTECEKSFVQKQHLLQHQKIH 
YECTQCAKAFVRKSHLVQHEKIH 
YECTECEKAFVRKSHLVQHQKIH 
YECKECGKAFLQKAHLTEHQKIH 
YECKECGKGFNRGAHL I QHQKIH 
YECKECGKGFNRGAHLIQHQKIH 
F E C KE CGKAFRLHMQL IRHQKLH 
F E C KE C GKAF RLHMHL I RHQ KLH 
FECKECGKAFRLHIQFTRHQKFH 
YECKECGKAFRLYLQLSQHQKTH 
YECTECGKAFSRASNLTRHQRIH 
YECVECGKAFTRMSGLTRHKRIH 
YECMECGKAFNRKSYLTQHQRIH 
HECVECGKRFS S S SRLQEHQKIH 
HACPECGKTFATSSGLKQHKHIH 
YE CNE C GKAF S R S S GL FNHRG I H 
YE CND CGKAF SNSSTLI QHQKVH 
YE C I Q CGKAF S ERS TLVRHQKVH 



WO 02/099084 



PCT/US02/22272 



88 



Z13 2__HUMAN 


809 


YECDECGKAFSNRSHLIRHEKVH 


Z124JBUMAN 


810 


YECQKCGKAFSRASTLWKHKKTH 


ZN3 5_HUMAN 


811 


FKCNECEKAFSYSSQLARHQKVH 


06 0 7 92_HUMAN 


812 


FECSECGKAFSYLSNLNQHQKTH 


0754 6 7__HUMAN 


813 


FRCSECGKAFSHGSNLSQHRKIH 


075467_HUMAN* 


814 


FACPQCGRAFSHSSNLTQHQLLH 


OZF_HUMAN 


815 


FACKVCGKVFSHKSNLTEHEHFH 


Z13 2JBUMAN 


816 


YECSQCGKLFSHLCNLAQHKKIH 


O60765_HUMAN 


817 


YECNTCGKLFNHRSSLTNHYKIH 


O60792_HUMAN 


818 


YECAECGKAFRHCSSLAQHQKTH 


043 3 3 6_HUMAN 


819 


CECSECGKCFRHRTSLIQHQKVH 


043 3 3 6_HUMAN 


820 


CECNECGKVFSHQKRLLEHQKVH 


0958 78_HUMAN 


821 


YECTECGRTFSDISNFGAHQRTH 


O6 0 7 92_HUMAN 


822 


YECNECGKAFSQHSNLTQHQKTH 


O43 3 0 9_HUMAN 


823 


YHCNDCGKAFSQKAGLFHHIKIH 


04 3 3 3 6_HUMAN 


824 


.YECSDCGKAFISKQTLLKHHKIH 


O6 0 8 93_HUMAN 


825 


YE CDDCGKTF S QSCS LLEHHKI H 


043 3 3 8_HUMAN 


826 


FE CDECGKS F S QRTTLNKHHKVH 


07512 3_HUMAN 


827 


YVCSYCGKGFIQRSNFLQHQKIH 


O60 792_HUMAN 


828 


YTCNECGKAFSQRGHFMEHQKIH 


ZN4 2_HUMAN 


829 


YTCDVCGKVF SQRSNLLRHQKI H 


O14 70 9J3UMAN 


830 


YGCNDCS KV FRQRKNLTVHQKIH 


0433 61_HUMAN 


831 


YVCSECGKAFLTQAHLDGHQKIQ 


043 3 61_HUMAN 


832 


YT C S E CGKAFLTQAHL VGHQ KI H 


043 3 6 INHUMAN 


833 


YECTQCGKAFLTQAHLVGHQKTH 


Z15 7_HUMAN 


834 


YECGECAKTFSARSYLIAHQKTH 


075123_HUMAN 


835 


YECNECGKAFFLSSYLIRHQKIH 


Q13 3 9 8_HUMAN 


836 


YE CNECGKFFT YYS S F 1 1 HQRVH 


04 3 3 6 1_HUMAN 


837 


YKCSKCGKFFRYRCTLSRHQKVH 


043 3 61_HUMAN 


838 


YE CNKCGKF FMYNS KL I RHQKVH 


Z13 2_HUMAN 


839 


YE CNECGKF F S QNS I L I KHQKVH 


Q13 3 96_HUMAN 


840 


YECGYCGKSFSHPSDLVRHQRIH 


0754 67_HUMAN 


841 


YAC P VC GKAFRH S S S LVRHQR I H 


Z16 5__HUMAN 


842 


HQ CNE CGKAF RH S S KL ARHQR I H 


Z2 05_HUMAN 


843 


YHCLDCGKSFSHSSHLTAHQRTH 


Z13 5_HUMAN 


844 


YACRDCGKAFTHS S SLTKHQRTH 


Z13 5_HUMAN 


845 


YECNDCGKAFSHSSSLTKHQRIH 


Z13 5_HUMAN 


846 


YQCGECGKAFSHSSSLTKHQRIH 


ZN74_HUMAN 


847 


FDCS QCWKAFS CHS S L IMHQRIH 


ZN74_HUMAN 


848 


YT CGE CGKAF S CH S S LNVHQR I H 


ZN3 5_JHUMAN 


849 


YECKECGKAFSCFSHLIVHQRIH 


O43309_HUMAN 


850 


YKCMECGKAFGRWSALNQHQRLH 


ZN2 4_HUMAN 


851 


YGCVE CGKAF S RS S I LVQHQRVH 


Z191_HUMAN 


852 


YGCVECGKAFSRSSILVQHQRVH 


0432 96_HUMAN 


853 


YKCS E CGKAFSRS S S LTQHQRMH 


ZN7 5__HUMAN 


854 


FKCQECGKS FRVS SDL I KHHRIH 


O752 90_HUMAN 


855 


FVCKECGMAFRYHYQLIEHCQIH 


075467 HUMAN 


856 


FVCTQCGRAFRERPALFHHQRIH 



WO 02/099084 



PCT/US02/22272 



89 



ZN74_HUMAN 


857 


FKCEKCGEMFNWS SHLTEHQRLH 


ZN8 5_HUMAN 


858 


FKCTKCGKS FGMI S CLTEHSRIH 


ZN43_HUMAN 


859 


FKCKECGKSFCMLPHLAQHKI IH 


Z195_HUMAN 


860 


FKCQECGKSFQMLSFLTEHQKIH 


ZN0 7_HUMAN 


861 


FKCDECGKAFRWISRLSQHQLIH 


Z18 9_HUMAN 


862 


HKCGECGKAFRLSTYLIQHQKIH 


O75 8 02__HUMAN 


863 


HKCGE CGKAFRL S TYL I QHQKI H 


ZN0 7_HUMAN 


864 


FKCTECGKAFRLSSKLIQHQRIH 


O7 52 9 0_HUMAN 


865 


FECKECGKAFTLLTKLVRHQKIH 


O752 9 0_HU3V[AN 


866 


FECKECGKVFSLPTQLNRHKNIH 


O752 9 0_HUMAN 


867 


FECRECGKAFSLLNQLNRHKNIH 


O7 52 9 0_HUMAN 


868 


FECKECEKAFSNRAHLIQHYI IH 


04 3 2 9 6 JEUMAN 


869 


FECKECGKAFSNRKDLIRHFSIH 


06242 5_CAEEL 


870 


FVCKVCGKAFRQASTLCRHKI IH 


07 512 3_HUMAN 


871 


FECKDCGKAFIQSSKLLLHQI IH 


O752 90_HUMAN 


872 


FECKECGKFFRRGSNLNQHRS IH 


O752 90_HUMAN 


873 


FE C KE CGKS FNRS SNLVQHQ S I H 


O752 90_HUMAN 


874 


FECKEC GKS FNR S SNLVQHQ SIH 


O752 9 0_HUMAN 


875 


FECQDCGKAFNRGS SLVQHQS IH 


094 8 92_HUMAN 


876 


F VC SEC RKAF S S KRNL I VHQRTH 


O14 7 0 9_HUMAN 


877 


FECSECGRAFSSNRNLIEHKRIH 


Z13 5_HUMAN 


878 


YE CNQ CGRAS ARATLL I EHQR I H 


Z15 7_HUMAN 


879 


FECQECGKAFCRKAHLTEHQRTH 


Z15 7_HUMAN 


880 


FECNECGKAYCRKSNLVEHLRIH 


07 5123_HUMAN 


881 


FECNECGKAFIRSSKLIQHQRIH 


ZN42_HUMAN 


882 


FRCAECGQSFRQRSNLLQHQRIH 


ZN42_HUMAN 


883 


FAC PE CGQ S FRQHANLTQHRR I H 


ZN42_HUMAN 


884 


FACAECGQSFRQRSNLTQHRRIH 


ZN42_HUMAN 


885 


- - CAECGKAFRQRPTLTQHLRVH 


ZN42_HUMAN 


886 


YACPECGKAFRQRPTLTQHLRTH 


014 913_HUMAN 


887 


YKCEECGNSFYYPAMLKQHQRIH 


Z174_HUMAN 


888 


YTCGECGNCFGRQSTLKLHQRIH 


PLZF__HUMAN 


889 


YECEFCGSCFRDESTliKSHKRIH 


BCL6_HUMAN 


890 


YPCEICGTRFRHLQTLKSHLRIH 


043296_HUMAN 


891 


FECLECGKAFNHRSYLKRHQRIH 


043 3 3 7_HUMAN 


892 


YKCLECGKAFKRRSYLMQHHPIH 


043 2 9 6_HUMAN 


893 


YECLECGKVFKHRSYLMWHQQTH 


075123_HUMAN 


894 


YECKECGKAFRHRSDLIEHQRIH 


043 3 3 6_HUMAN 


895 


, YECKECGKAFIHKKRLLEHQRIH 


Z15 7_HUMAN 


896 


YE C S E CGNAF YVKVRL I EHQR I H 


Z15 7_HUMAN 


897 


YE CNECGNAF YVKARL I EHQRMH 


OZF_HUMAN 


898 


FVCKECGKTFSGKSNLTEHEKIH 


Z13 4JSUMAN 


899 


YKC SD CGKVFRHKS TLVQHE SIH 


06 0 8 93_HUMAN 


900 


YECEDCGKTF I GS SALVI HQRVH 


043339_HUMAN 


901 


YECSECGKLFRQNSSLVDHQKIH 


043 3 3 8_HUMAN 


902 


FECSECGKFFRQSYTLVEHQKIH 


04 3 3 3 8_HUMAN 


903 


YECGECGKLFRQSFSLWHQRIH 


043361 HUMAN" 


904 


YECSECGKLFMDSFTLGRHQRVH 



WO 02/099084 



PCT/US02/22272 



90 



043361_HUMAN 905 

04 3 3 6 INHUMAN 906 

043 3 3 6_HUMAN 907 

ZN0 7_HUMAN 90 8 

Z132_HUMAN 909 

T Y Y 1 HUMAN 910 

0153 91 HUMAN 911 

094892_HUMAN 912 

ZFX__HUMAN 913 

ZFY_HUMAN 914 

Q15558_HUMAN 915 

Z135_HUMAN 916 

ZN74_HUMAN 917 

Z174__HUMAN 918 

Z195_HUMAN 919 

HKR3 __HUMAN 92 0 

043337JSUMAN 921 

O6 0 7 65_HUMAN 922 

O6 0 7 65_HUMAN 923 

Z14 0_HUMAN ' 924 

Q14 5 8 5__HUMAN 92 5 

Q14585_HUMAN 926 

Q145 8 5_HUMAN 92 7 

Q14585__HUMAN 928 

Q14 5 8 5__HUMAN 929 

Q14585_HUMAN 930 

Q14 5 85_HUMAN 931 

Q145 8 5__HUMAN 932 

Q145 8 5_HUMAN 933 

Q14585_HUMAN 934 

Q14585_HUMAN 935 

Q14 5 8 5__HUMAN 93 6 

Q14 5 8 5__HUMAN 937 

ZN8 0_HUMAN 93 8 

ZN8 0_HUMAN 93 9 

Z16 5_HUMAN 94 0 

Z2 02_HUMAN 941 

043167_HUMAN 942 

Q92618_HUMAN 943 

Q15776_HUMAN 944 

Q1553 5_HUMAN 945 

O60893JHUMAN 946 

ZN24JHUMAN 94 7 

Z191__HUMAN 948 

Z140_HUMAN 949 

Q145 8 5_HUMAN 950 

Q75123_HUMAN 951 

UKLF HUMAN 952 



YECSECGKFFRDS YKLI IHQRVH 
YECNECGKFFLDSYKLVIHQRIH 
YECSECGKGFYLEVKLLQHQRIH 
YECAECGKVFRLCSQLNQHQRIH 
HVCKECGKAFSHSSKLRKHQKFH 
HVCAECGKAFVES SKLKRHQLVH 
HVCAECGKAFLE S S KLRRHQLVH 
HVCS E CGKAF VKKS QLTDHERVH 
H I C VE CGKGFRH P S ELKKHMR I H 
HICVECGKGFRYPSELRKHMRIH 
HI CVECGKGFRHPSELRKHMRIH 
YECHECLKGFRNSSALTKHQRIH 
YTCGE CGKAFRQ S S S LTLHRRWH 
YQCGQCGKS FRQS SNLHQHHRLH 
YQCEECGKVFRTCSSLSNHKRTH 
FQCHLCGKTFRTQASLDKHNRTH 
YD CMACGKAFRC SSELI QHQRIH 
YLCNE CGNTF KS S S S LRYHQR I H 
YKCNECGKTFRCNSSLSNHQRIH 
YKCNECGKAFSSGSELIRHQITH 
YECKECGKAFSFGSGLIRHQI IH 
Y I CNE CGKAF S FGS ALTRHQR I H 
YE C KECGKS FSS GS ALNRHQR I H 
YECKAC GMAF S S GS AL TRHQR I H 
YECKECGKSFSFESALIRHHRIH 
YECKECGKTFSSGSDLTQHHRIH 
YVCKECGKAFNSGSDLTQHQRIH 
YECKECGKAFYSGSSLTQHQRIH 
FECKECGKAFGSGSNLTHHQRIH 
YE CKE CGKAFG S GANLA YHQR I H 
YECIDCGKAFGSGSNLTQHRRIH 
YECKECGKAFGSGSKLIQHQLIH 
YECKECEKAFRSGSKLIQHQRMH 
YECKECGKTFYYNSSLTRHMKIH 
YECKECGKGFYYSYSLTRHTRSH 
YE CNE CGKS FAE S S DLTRHRR I H 
YKCTI CGKS F S QKS VLTTHQR I H 
YTCEICGKSFTAKSSLQTHIRIH 
HTCCICGKSFPFQSSLSQHMRKH 
HKCDE CGKS FAQ S S GLVRHWRI H 
HKCDECGKSFTQSSGLIRHQRIH 
HYCHEC GKS FAQ S S GLTKHRR I H 
HICDEC GKHF S Q GS AL ILHQRIH 
HICDECGKHFSQGSALILHQRIH 
YACKECGKTFSQI SNLVKHQMIH 
YE CKE CGKDF S FVS VLVRHQR I H 
FECKECGKGFSQSSLLIRHQRIH 
FKCNHCDRCFSRSDHLALHMKRH 



WO 02/099084 



PCT/US02/22272 



O95 6 0 0_JHUMAN 


953 


SP2_HUMAN 


954 


SP4__HUMAN 


955 


O6 04 02_HUMAN 


956 


075411_HUMAN 


957 


Q13118__HUMAN 


958 


O14 9 01JHUMAN 


959 


BTE1_HUMAN 


960 


SP2__HUMAN 


961 


SP4__HUMAN 


962 


O60402_HUMAN 


963 


EZF_HUMAN 


964 


O95 6 0 0_HUMAN 


965 


UKLF_HUMAN 


966 


EKLF_HUMAN 


967 


BTE2_HUMAN 


968 


014 9 0 1 HUMAN 


969 


Q13118__HUMAN 


970 


O 7 5 4 1 1 HUMAN 


971 


BTE 1_HUMAN 


972 


EGR4_HUMAN 


973 


EGR2_HUMAN 


974 


EGR1_HUMAN 


975 


EGR3_HUMAN 


976 


Q162 56_HUMAN 


977 


WT1 HUMAN 


978 


Q15 8 8 INHUMAN 


979 


Q15 8 8 INHUMAN 


980 


Q16256_HUMAN 


981 


WT1 HUMAN 


982 


EGR4__HUMAN 


983 


EGR3_HUMAN 


984 


EGR2_HUMAN 


985 


EGR1__HUMAN 


986 


EVI 1_HUMAN 


987 


095878_HUMAN 


988 


Z14 0_HUMAN 


989 


O6 0 8 93_HUMAN 


990 


Z13 5_HUMAN 


991 


095878JHUMAN 


992 


ZN8 0_HUMAN 


993 


ZN8 0_HUMAN 


994 


Z135_HUMAN 


995 


Z13 5_HUMAN 


996 


Z2 63JE1UMAN 


997 


Z2 63_HUMAN 


998 


Z2 02_HUMAN 


999 


ZN74 HUMAN 


1000 



91 

FRCTDCNRSFSRSDHLSLHRRRH 
YACAQCQKRFMRSDHLTKHYKTH 
YAC PEC S KRFMRS DHL S KHVKTH 
YAC PECS KRFMR S DHL S KHVKTH 
YACPMCDRRFMRSDHLTKHARRH 
YACPMCDRRFMRSDHLTKHARRH 
YACPVCDRRFMRSDHLTKHARRH 
YACPLCEKRFMRSDHLTKHARRH 
FVCNWFFCGKRFTRSDELQRHARTH 
F I CNWMFCGKRFTRSDELQRHRRTH 
FICNWMFCGKRFTRSDELQRHRRTH 
YHCDWDGCGWKFARSDELTRHYRKH 
YKCTWDGCSWKFARSDELTRHFRKH 
YKCSWEGCEWRFARSDELTRHYRKH 
YACTWEGCGWRFARSDELTRHYRKH 
YKCTWEGCDWRFARSDELTRHYRKH 
• FNCSWDGCDKKFARSDELSRHRRTH 
FSCSWKGCERRFARSDELSRHRRTH 
FSCSWKGCERRFARSDELSRHRRTH 
FPCTWPDCLKKFSRSDELTRHYRTH 
F AC P VE S C VR S FARS D ELNRHLR I H 
YPCPAEGCDRRFSRSDELTRHIRIH 
YACPVESCDRRFSRSDELTRHIRIH 
HACPAEGCDRRFSRSDELTRHLRIH 
YQCDFKDCERRFFRSDQLKRHQRRH 
YQCDFKDCERRFSRSDQLKRHQRRH 
YQCDFKDCERRFSRSDQLKRHQRRH 
FQCKACQRKFSRSDHLKTHTRTH 
FQCKTCQRKFSRSDHLKTHTRTH 
FQCKTCQRKFSRSDHLKTHTRTH 
FQCRICLRNFSRSDHLTSHVRTH 
FQCRI CMRS F SRSDHLTTH I RTH 
FQCR I CMRNF S RS DHLTTH I RTH 
FQCRI CMRNFSRSDHLTTHIRTH 
YTCRYCGKI FPRSANLTRHLRTH 
YRCTVCGKHF S RS SNL I RHQKTH 
YVCKVCNKSFSWS SNLAKHQRTH 
YECEECGKVFSHSSNLIKHQRTH 
YECSECGKSFSFRSSFSQHERTH 
YICCECGKSFSNSSSFGVHHRTH 
CKCSECGKTFTYRSVFFRHSMTH 
YECSECGKTFSYHSVFIQHRVTH 
YGCNE C GKS FSHSSSLS QHERTH 
YGCNECGKTFSHSSSLSQHERTH 
YKCPECGKSFSRSSHLVIHERTH 
YKCSECGESFSRSSRLMSHQRTH 
CRCNE CGKSFS RRDHL VRHQRTH 
FKCSDCEKAFNSRSRLTLHQRTH 
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ZN4 2_HUMAN 


1001 


Z2 0 5_HUMAN 


1002 


ZN7 5__HUMAN 


1003 


ZN0 7_HUMAN 


1004 


O15 0 9 0_HUMAN 


1005 


094 8 92_HUMAN 


1006 


O952 7 0_HUMAN 


1007 


GFI1_HUMAN 


1008 


Z13 5__HUMAN 


1009 


O6 0 7 65_HUMAN 


1010 


O6 0 7 65_HUMAN 


1011 


O60 792_HUMAN 


1012 


Z15 INHUMAN 


1013 


EVI 1_HUMAN 


1014 


Z2 0 5_HUMAN 


1015 


Z2 0 5_HUMAN 


1016 


Z124_HUMAN 


1017 


Z2 0 0_HUMAN 


1018 


Ol 5 3 6 1 HUMAN 


1019 


ZNO 7_HUMAN 


1020 


Z2 63_HUMAN 


1021 


Q13134_HUMAN 


1022 


Q13 12 7_HUMAN 


1023 


CTCF_HUMAN 


1024 


Q99592_HUMAN 


1025 


Q13397_HUMAN 


1026 


O6 0 7 65_HUMAN 


1027 


ZN74_HUMAN 


1028 


ZN75_HUMAN 


1029 


Z 1 8 9___HUMAN 


1030 


O7 5 8 02_HUMAN 


1031 


Z18 6_HUMAN 


1032 


Z18 6_HUMAN 


1033 


ZN84_HUMAN 


1034 


O6 0 7 92_HUMAN 


1035 


O75066J3UMAN 


1036 


095878_HUMAN 


1037 


P918 05_SARPE 


1038 


Z133_HUMAN 


1039 


Z13 3_HUMAN 


1040 


043 3 3 6_HUMAN 


1041 


075467_HUMAN 


1042 


Z124_HUMAN 


1043 


Z17 7_HUMAN 


1044 


Z177_HUMAN 


1045 


ZN84_HUMAN 


1046 


Z13 5_HUMAN 


1047 


Z135 HUMAN 


1048 
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FACPECGQRFSQRLKLTRHQRTH 
YPCPECGKCFSQRSNLIAHNRTH 
FKCDECGKRF I QNSHL I KHQRTH 
FKCDECGKGFVQGSHLIQHQRIH 
YPCPLCGKRFRFNSILSLHMRTH 
YRCSE CGKGF IVNS GLMLHQRTH 
HKCQVCGKAF SQSSNLI THSRKH 
HKCQVCGKAFSQSSNLITHSRKH 
YKCQECGKAF SHS SAL I EHHRTH 
F KC KE C S KAF SQS SAL I QHQ I TH 
CKCKVCGKAFRQSSAL I QHQRMH 
C KCNE CGKAFSYC SAL I RHQRTH 
YVCERCGKRFVQ S S QLANH I RHH 
YE CENCAKVF TD P SNLQRH I RS QH 
YVCDRCAKRFTRRSDLVTHQGTH 
HKCPI CAKCFTQ S S ALVTHQRTH 
YGCTICEKVFNIPSSFQIHQRNH 
YTC PLCGKQFNE S S YL I SHQRTH 
YTCPLCGKQFNESSYLISHQRTH 
YKCNKCTKAFGC SSRLI RHQRTH 
YQCNI CGKCFSCNSNLHRHQRTH 
YKCELCPYSSSQKTHLTRHMRTH 
YKCELCPYS S S QKTHLTRHMRTH 
FQCSLCSYASRDTYKLKRHMRTH 
YTCSLCGKTFSCMYTLKRHERTH 
YTCSLCGKTFSCMYTLKRHERTH 
YKCSLCEKTFINTSSLRKHEKNH 
YKCSACEKAFSCSSLLSMHLRVH 
YKCQQCDRRFRWS SDLNKHFMTH 
YQCNQCKQSFSQRRSLVKHQRIH 
YQCNQCKQ S F S QRRS LVKHQR I H 
YACNCCEKLFSYKSSLTIHQRIH 
YACDHCEKAFSHKSKLTVHQRTH 
YE CRDCE KAF S QKS QLNTHQR I H 
YQCNKCEKTFSQSSHLTQHQRIH 
YACQ YCDAVFAQS IELSRHVRTH 
YRCD I CGKS F S Q S ATL AVHHRTH 
YQCKVCQKRFPQLSTLHNHERTH 
YACKECGRCFRQRTTLVNHQRTH 
YVCGVCGHSFSQNSTLISHRRTH 
YVCI ECGKSLS SKYSLVEHQRTH 
YACAQCGRRFCRNSHL I QHERTH 
YE C KQCGKAF S RS S HLRDHERTH 
YE CNQ C GKS F S TG S YL I VHKRTH 
YECDHCGKS FSQS SHLNVHKRTH 
YACGNCGKTFPQKSQFITHHRTH 
YECHECGKAFTQITPLIQHQRTH 
YE CNQ CGRAF SQ LAPLIQHQRIH 
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Z13 5_HUMAN 


1049 


O6 0 8 93_HUMAN 


1050 


04 3 3 3 7_HUMAN 


1051 


Z2 05_HUMAN 


1052 


Z2 02_HUMAN 


1053 


ZN4 5_HUMAN 


1054 


ZN4 5_HUMAN 


1055 


Z23 9_HUMAN 


1056 


Z23 9_HUMAN 


1057 


Z2 3 9_HUMAN 


1058 


Z2 3 9_HUMAN 


1059 


015 3 22_HUMAN 


1060 


Z2 3 9__HUMAN 


1061 


ZNO 7_HUMAN 


1062 


Z13 3_HUMAN 


1063 


Z13 3_HUMAN 


1064 


Z133_HUMAN 


1065 


EVI 1_HUMAN 


1066 


RRE1_HUMAN 


1067 


O75 8 5 0_HUMAN 


1068 


O75850JBUMAN 


1069 


O75850_HUMAN 


1070 


ZN4 2_HUMAN 


1071 


Z13 2_HUMAN 


1072 


ZN3 5__HUMAN 


1073 


Z132_HUMAN 


1074 


Z2 02_HUMAN 


1075 


Z13 4_HUMAN 


1076 


Z2 3 9J3UMAN 


1077 


Z165_HUMAN 


1078 


Z132_HUMAN 


1079 


Z2 3 9_HUMAN 


1080 


O00153_HUMAN 


1081 


Q133 98_HUMAN 


1082 


0153 22_HUMAN 


1083 


075123_HUMAN 


1084 


014913_HUMAN 


1085 


014 913__HUMAN 


1086 


ZN8 3_HUMAN 


1087 


015322_HUMAN 


1088 


O6 0 792_HUMAN 


1089 


Z13 7J3UMAN 


1090 


075123J3UMAN 


1091 


Z13 4__HUMAN 


1092 


043 3 61_HUMAN 


1093 


Z13 4__HUMAN 


1094 


Z134_HUMAJST 


1095 


014913 HUMAN 


1096 



YKCTQCGRTFNQ IAPLI QHQRTH 

YQCDTCGKGFTRTSYLVQHQRSH 

YKCKQCGKGFNRKWYLVRHQRVH 

YRCE Q CGKGF S WHSHLVTHRRTH 

YRCDDCGKHFRWTSDLVRHQRTH 

YRCDVCGKRFRQRSYLQAHQRVH 

YQCDACGKGFSRSSDFNIHFRVH 

YQ C YE CGKGF SQSSDLRI HLRVH 

YKCDKCGKGFSQSSKLHIHQRVH 

YHCGKCGKGFSQSSKLLIHQRVH 

YKCGE CGKGF SQS SNLH I HRC I H 

YKCDMCGKEFSQSSCLQTHERVH 

YACQYCGKNF SQSSE LLLHQRDH 

YPCKECGKAFSQSSTLAQHQRMH 

YVC KTCGRG FSLKSHLS RHRKTH 

YVCGVCGRGFSLKSHLNRHQNIH 

YVC GVCEKGF S LKKS LARHQKAH 

YRCKYCDRSFSISSNLQRHVRNIH 

YKCQTCERTFTLKHSLVRHQRIH 

YACAQCGRRFSRKSHLGRHQAVH 

HACAVCARS F S S KTNLVRHQAI H 

YQCAQCARSFTHKQHLVRHQRVH 

FVCSECGRSFSRSSHLLRHQLTH 

FECSECGRDFSQSSHLLRHQKVH 

YECEKCGAAFI SNSHLMRHHRTH 

YECSE CGRAF S SNS HL VRHQRVH 

YKCMECGKSYTRSSHLARHQKVH 

YECSECGKAYSLSSHLNRHQKVH 

YECSKCGKGFSQSSNLHSHQRVH 

YECSECGRAFSQSSNLSQHQRIH 

YE C S E C GRAFNNNSNLAQHQ KVH 

YECEECGMSFSQRSNLHIHQRDH 

HQCQVCGKTFSQSGSRNVHMRKH 

YVCGECGKSFSHSSNLKNHQRVH 

YKCE I CGKS FCLRS S LNRHYMVH 

FKCAQCGKAFCHSSDLIRHQRVH 

YKCEECDKAFLYHSFLRRHKAVH 

YKCEECDKAFLHHSYLRKHQAVH 

FKCNECGKLFRDNSYLVRHQRFH 

HTCNECGKSFCYI SALRIHQRVH 

FGCNDCGKS FRYRS ALNKHQRLH 

YKCNKCGKI FRHRS YLAVYQRTH 

YVCJWCGKDFIHYSGLIEHQRVH 

YKCNE CGKYF S HHSNL I VHQRVH 

FECS I CGKFFSHRS TLNMHQRVH 

FECIECGKFFSRSSDYIAHQRVH 

FVC S KCGKD F I RT SHLVRHQRVH 

YKCQECGKSFCYRSYLREHYRMH 
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Z174_HUMAM 


1097 


O60 7 65_HUMAN 


1098 


043167__HUMAN 


1099 


04 3 8 2 9_HUMAN 


1100 


O00403_HUMAN 


1101 


Q75 62 6_HUMAN 


1102 


015 3 22__HUMAN 


1103 


BCL6_JHUMAN 


1104 


Z195__HUMAN 


1105 


ZN8 5_HUMAN 


1106 


Z2 3 9_HUMAN 


1107 


Z23 9_HUMAN 


1108 


015322_HUMAN 


1109 


015322_HUMAN 


1110 


Q14 913_HUMAN 


1111 


014 913_HUMAN 


1112 


014913_HUMAN 


1113 


ZN4 5_HUMAN 


1114 


ZN4 5_HUMAN 


1115 


ZN4 5_HUMAN 


1116 


ZN4 5_HUMAN 


1117 


ZN4 5_HUMAN 


1118 


ZN4 5_HUMAN 


1119 


ZN4 5_HUMAN 


1120 


ZN4 5_HUMAN 


1121 


ZN4 5_HUMAN 


1122 


ZN4 5_HUMAN 


1123 


ZN4 5_HUMAN 


1124 


ZN4 5_HUMAN 


1125 


0754 67_HUMAN 


1126 


ZN42_HUMAN 


1127 


O60 7 65JHUMAN 


1128 


T Y Y 1 HUMAN 


1129 


0 1 5 3 9 1 HUMAN 


1130 


TYY1_HUMAN 


1131 


015 3 9 1 HUMAN 


1132 


Q14 8 72_HUMAN 


1133 


GLI 1_HUMAN 


1134 


GLI3_HUMAN 


1135 


O6 02 55_HUMAN 


1136 


O60254_HUMAN 


1137 


O60253_HUMAN 


1138 


OS 02 5 2_HUMAN 


1139 


GLI2_HUMAN 


1140 


O954 0 9_HUMAN 


1141 


Q15 915__HUMAN 


1142 


ZIC3_HUMAN 


1143 


GLI1 HUMAN 


1144 
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YKCDDCGKSFTWNSELKRHKRVH 

YRCKE CGKS FSRRSGLF I HQKI H 

YS CGI CGKS FSDS S AKRRHC I LH 

FVCEMCTKGFTTQAHLKEHLiKIH 

FVCEMCTKGFTTQAHLKEHLKIH 

FKCQTCNKGFTQLAHLQKHYLVH 

FKCEQCGKGFRCRAI LQ VHCKLH 

YKCE T C GARF VQVAHLRAH VL I H 

YKCEKCGKAFTQFSHLTVHESIH 

YKCKKCGKAFNQSAHLTTHEVIH 

YKCEKCGKGFTRSSSLLIHHAVH 

YKCEQCGKGFTRSSSLLIHQAVH 

YKCEECGKGFTDSLDLHKHQIIH 

Y I CEKCGRAF I HDLKLQKHQ I I H 

YKCEKCGKGFFRS SDLQHHQKI H 

YKCEECGKCFSSFTSLKRHQI IH 

YPYKCEECGKGFSRSSKLQEHQTIH 

YKGEHCVKSFSWSSHLQINQRAH 

YKCEECGKGFSWSSSLI IHQRVH 

YKCEECGKVFSWSSYLQAHQRVH 

YKCEKCDNAFRRFSSLQAHQRVH 

YKCERCGKAFSQFSSLQVHQRVH 

YKCEECGVGFSQRSYLQVHLKVH 

YKCEECGKSFSWRSRLQAHERIH 

YKCEECGKGFSVGSHLQAHQISH 

YQ CAE CGKGF SVGSQLQAHQRCH 

YQCEECGKGFCRASNFLAHRGVH 

YKCEECGKGFCRASNLLDHQRGH 

YKCEECGKGFSQASNLLAHQRGH 

FVCALCGAAFSQGSSLFKHQRVH 

YHCGECGLGFTQVSRLTEHQRIH 

YRCNE CGKGFT S I SRLNRHRI IH 

Y VC P FDGCNKKFAQ S TNL KS H I LTH 

F VC P F DVCNRKFAQ S TNL KTH I LTH 

FQCTFEGCGKRFSLDFNLRTHVRIH 

FQCTFEGCGKRFSLDFNLRTHLRIH 

YQCTFEGCPRTYS TAGNLRTHQKTH 

HKCTFEGCRKSYSRLENLKTHLRSH 

HKCTFEGCTKAYSRLENLKTHLRSH 

HKCTFEGCSKAYSRLENLKTHLRSH 

HKCTFEGCSKAYSRLENLKTHLRSH 

HKCTFEGCSKAYSRLENLKTHLRSH 

HKCTFEGCSKAYSRLENLKTHLRSH 

HKCTFEGCSKAYSRLENLKTHLRSH 

FQCEFEGCDRRFANS SDRKKHMHVH 

FKCEFEGCDRRFANSSDRKKHMHVH 

FKCEFEGCDRRFANS SDRKKHMHVH 

YMCEHEGC S KAFSNAS DRAKHQNRTH 
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O60255_HUMAN 


1145 


O60254_HUMAN 


1146 


O602 53_HUMM\F 


1147 


O6 02 52_HUMAN 


1148 


GLI3_HUMAN 


1149 


GLI2_HUMAN 


1150 


Z143_HUMAN 


1151 


TF3A_HUMAN 


1152 


TF3A_HUMAN 


1153 


Q14 8 72_HUMAN 


1154 


Q14872_HUMAN 


1155 


ZN7 6_HUMAN 


1156 


Z143_HUMAN 


1157 


Q14872_HUMAN 


1158 


O00153_HUMAN 


1159 


ZM7 6_HUMAN 


1160 


Z143_HUMAN 


1161 


Q15915_HUMAN 


1162 


O954 0 9_HUMAN 


1163 


ZIC3_HUMAN 


1164 


ZN7 6_HUMAN 


1165 


Z14 3_HUMAN 


1166 


O00153_HUMAN 


1167 


ZN7 6_HUMAN 


1168 


Z143_HUMAN 


1169 


Q14872_HUMAN 


1170 


ZN7 6_HUMAN 


1171 


Z143_HUMAN 


1172 


BTE 1_HUMAN 


1173 


BTE2_HUMAN 


1174 


04 3 8 3 9_HUMAN 


1175 


UKTiF HUMAN 


1176 


O95 6 0 0_HUMAN 


1177 


Q13118_HUMAN 


1178 


0754 11 HUMAN 


1179 


EZF_HUMAN 


1180 


O14 9 01_HUMAN 


1181 


SP4_HUMAN 


1182 


O60402_HUMAN 


1183 


EKLF_HUMAN 


1184 


WT INHUMAN 


1185 


Q162 5 6__HUMAN 


1186 


Q15 8 81_HUMAN 


1187 


SP2_HUMAN 


1188 


043167_HUMAN 


1189 


0754 67__HUMAN 


1190 


ZEP1_HUMAN 


1191 


Q02646 HUMAN 


1192 
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YVCEHEGCNKAF SNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCEHEGCNKAF SNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

YVCEHEGCNKAF SNASDRAKHQNRTH 

YVCEHEGCNKAF SNASDRAKHQNRTH 

YVCTVPGCDKRFTEYS SLYKHHWH 

FKCTQEGCGKHFAS P S KLKRHAKAH 

F VCD YE G CGKAF I RDYHL S RH I LTH 

FECDVQGCEKAFNTLYRLKAHQRLH 

FVCNQEGCGKAFLTSHSLRIHVRVH 

YRCDFP SCGKAFATGYGLKSHVRTH 

YQCEHAGCGKAFATGYGLKSHVRTH 

FRCDHDGCGKAFAASHHLKTHVRTH 

F I C PAEGCGKS F YVLQRLKVHMRTH 

FQCPFEGCGRSFTTSNIRKVHVRTH 

FKCPFEGCGRSFTTSNIRKVHVRTH 

FPCPFPGCGKVFARSENLKIHKRTH 

FPCPFPGCGKVFARSENLKIHKRTH 

FPCPFPGCGKIFARSENLKIHKRTH 

YTCPE PHCGRG FT SATNYKNHVR I H 

YYCTE PGCGRAFAS ATNYKNHVRI H 

FMCHESGCGKQFTTAGNLKNHRRIH 

YKCPEELCSKAFKTSGDLQKHVRTH 

YRCSEDNCTKSFKTSGDLQKHIRTH 

FNCESEGCSKYFTTLSDLRKHIRTH 

FRCGYKGCGRLYTTAHHLKVHERAH 

FRCEYDGCGKLYTTAHHLKVHERSH 

HKCPYSGCGKVYGKSSHLKAHYRVH 

HYCDYPGCTKVYTKS SHLKAHLRTH 

HRCHFNGCRKVYTKSSHLKAHQRTH 

HRCQFNGCRKVYTKS SHLKAHQRTH 

HQCDFAGCSKVYTKSSHLKAHRRIH 

HICSHPGCGKTYFKSSHLKAHTRTH 

HICSHPGCGKTYFKSSHLKAHTRTH 

HTCDYAGCGKTYTKS SHLKAHLRTH 

YVCSFPGCRKTYFKS SHLKAHLRTH 

HICHIEGCGKVYGKTSHLRAHLRWH 

H I CH I E GC GKVYGKT S HLRAHLRWH 

HTCAHPGCGKSYTKS SHLKAHLRTH 

FMCAYPGCNKRYFKLSHLQMHSRKH 

FMCAYPGCNKRYFKLSHLQMHSRKH 

FMCAYPGCNKRYFKLSHLQMHSRKH 

HVCHI PDCGKTFRKTSLLRAHVRLH 

YACKDCHRKFMDVSQLKKHLRTH 

YACRAC S KVFVKS SDLLKHLRTH 

YICEYCNRACAKPSVLLKHIRSH 

YICPYCSRACAKPSVLKKHIRSH 
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0753 62_HUMAN 


1193 


Q92 981_HUMAN 


1194 


O7 6 019_HUMAN 


1195 


RRE 1 HUMAN 


1196 


075 62 6_HUMAN 


1197 


Z2 02_HUMAN 


1198 


075123JHUMAN 


1199 


Z 1 5 1 HUMAN 


1200 


SNAI_HUMAN 


1201 


043623_HUMAN 


1202 


O954 0 9_HUMAN 


1203 


ZIC3_HUMAN 


1204 


O0 0 14 6_HUMAN 


1205 


O0 014 6_HUMAN 


1206 


I KAR_HUMAN 


1207 


CTCF_HUMAN 


1208 


HKR3_HUMAN 


1209 


Q15 552_HUMAN 


1210 


043 5 91_HUMAN 


1211 


PL Z F_HUMAN 


1212 


Z 1 5 1_HUMAN 


1213 


MAZ_HUMAN 


1214 


014753__HUMAN 


1215 


0953 65_HUMAN 


1216 


015156_HUMAN 


1217 


O75 0 66_HUMAN 


1218 


0953 65_HUMAN 


1219 


015156_HUMAN 


1220 


Z 1 5 1 HUMAN 


1221 


Z151_HUMAN 


1222 


Z151_HUMAN 


1223 


015090 HUMAN 


1224 
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YACSYCGKFFRSNYYLNIHLRTH 

YKCVQPDCGKAFVSRYKLMRHMATH 

YKCVQ PD CGKAFVS RYKLMRHMATH 

YACSVCNKRFWSLQDLTRHMRSH 

HE CQVCHKRF S S T SNLKTHLRLH 

HDCSVCGKSFTCNSHLVRHLRTH 

YACDI CGKTFTFNSDLVRHRI SH 

HKC SVC S KAF VNVGDL S KH 1 1 IH 

YACVCGTCGKAFSRPWLLQGHVRTH 

YACVCKICGKAFSRPWLLQGHIRTH 

HVCFWEECPREGKPFKAKYKLVNHIRVH 

HVC YWEE C PREGKS FKAKYKLVNH I RVH 

HE CKL CGAS FR T KG S L I RHHRRH 

HVC Q F C S RGFRE KG S LVRHVRHH 

FQCNQCGASFTQKGNLLRHIKLH 

HKCHLCGRAFRTVTLLRNHLNTH 

HVCEFCSHAFTQKANLNMHLRTH 

HVCEHCNAAFRTNYHLQRH VF I H 

HVCEHCNAAFRTNYHLQRHVFIH 

YICSECNRTFPSHTALKRHKRSH 

YVC I HCQRQ FAD PGALQRHVR I H 

Y I CALC AKE FKNGYNLRRHE AI H 

HLCTGCGKGFNDTFDLKRHVRTH 

YECNI CKVRFTRQDKLKVHMRKH 

YACEVCGVRFTRNDKLKIHMRKH 

YSCEECGAKFAANSTLKNHLRLH 

YLCQQCGAAFAHNYDLKNHMRVH 

Y S C PH C P ARF LH S YDL KNHMHLH 

HKCEDCGKEFTHTGNFKRHIRIH 

YRCEDCGKLFTTSGNLKRHQLVH 

YKCRECGKQFTTSGNLKRHLRIH 

YDCPYCGKTFRTSHHLKVHLRIH 
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Example 3: Non-human zinc finger databases. 

For providing novel combinations of non~antigenic ? optimised zinc fingers, for use in 
species other than humans, separate species-specific zinc finger databases are required, 
5 such as mouse, chicken, pig, cow, etc. 

The fingers listed below are in a format that can be linked with classical wild-type 
canonical "TGEKP" linkers (i.e. . . .TGEKP - zinc finger peptide sequence - TGEKP - 
zinc finger peptide sequence — TGEKP - etc. . .). For each peptide sequence, an 
10 oligonucleotide is designed to encode the peptide sequence; the oligonucleotide can then 
be linked into a library selection system, as described in the Examples infra. 

Mouse Zinc Finger Database. 

15 544 zinc finger units 



Name 


SEQ ID NO 


Peptide sequence 


03 5745JYIOUSE 


1225 


HQCTHCEKTFNRKDHLKNHLQTH 


ZFX2_MOUSE 


1226 


HRCEYCKKGFRRPSEKNQHIMRH 


ZFXIJVIOUSE 


1227 


HRCE YCKKGFRRP SEKNQH I MRH 


ZFY2JYtOUSE 


1228 


HKCDMC S KGFHRP S ELKKHVATH 


ZFY1JVIOUSE 


1229 


HKCDMCSKGFHRPSELKKHVATH 


ZFX2_MOUSE 


1230 


HKCDMCDKGFHRP S ELKKHVAAH 


ZFXIJVIOUSE 


1231 


HKCDMCDKGFHRPSELKKHVAAH 


ZFAJVIOUSE 


1232 


HKCDMCDKGFHRP S ELKKHVAAH 


Q9Z162JVIOUSE 


1233 


YT C S VC GKGF S RPDHL S CH VKHVH 


MAZJVIOUSE 


1234 


YNC SHCGKS F S RPDHLNS HVRQVH 


Q0 83 7 6JYIOUSE 


1235 


YSCEVCGKSFIRAPDLKKHERVH 


Z151_MOUSE 


1236 


HKCPHCDKKFNQVGNLKAHLKIH 


ZFX2_MOUSE 


1237 


FRCKRCRKGFRQQSELKKHMKTH 


ZFXl_MOUSE 


1238 


FRCKRCRKGFRQQSELKKHMKTH 


Q62 518JYIOUSE 


1239 


YVCTMCGKGYTLNSNLQVHLRVH 


Q60 63 6_MOUSE 


1240 


YECNVCAKTFGQLSNLKVHLRVH 


Q9Z117_MOUSE 


1241 


CSCPECGKVLHQLSHLRSHYRLH 


Q618 9 8_MOUSE 


1242 


CSCPECGREFHQLSHLRKHYRLH 


08 8 631JVEOUSE 


1243 


YSCQYCGKVFHQLSHFKSHFTLH 


Q61164JVIOUSE 


1244 


HKCPDCDMAFVTSGELVRHRRYKH 


03 54 83__MOUSE 


1245 


FRCADCGRGFAQRSNLAKHRRGH 


0354 83JVIOUSE 


1246 


FVCGVCGAGFSRRAHLTAHGRAH 


O70162JYIOUSE 


1247 


FVCRDCGQGFVRSARLEEHRRVH 


Q9Z1D8JVIOUSE 


1248 


HRCGDCGKFFLQASNF I QHRRI H 


03 54 83JV10USE 


1249 


HRCPDCGKGFGHSSD FKRHRRTH 


03 54 83 MOUSE 


1250 


ADCGKS F VYGS HLARHRRTH 
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CG54 83 _MOUSE 


1251 


08 82 82JVOCJSE 


1252 


Q610 65_MOUSE 


1253 
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Q9Z1D7JMOUSE 
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ZF9 0_MOUSE 


1266 


Q9Z2X6_MOUSE 


1267 


Q9Z2X6JYIOUSE 
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Q9Z2X6JVIOUSE 


1270 


Q9Z2X6JVIOUSE 
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Q9Z2X6JVLOUSE 


1272 
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ZF3 7JMOUSE 
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Q62 514__MOUSE 
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Q614 91_MOUSE 


1276 


ZF3 7JYIOUSE 


1277 


Q62 514JV10USE 


1278 


Q614 91JVIOUSE 


1279 


Q614 91JYKXJSE 


1280 


Q614 91JVIOUSE 


1281 


Q61491JVIOUSE 


1282 


Q614 91JMOUSE 


1283 


Q614 91_MOUSE 


1284 


Q614 91JYIOUSE 


1285 


Q614 91JYTOUSE 


1286 


Q614 91JVEOUSE 


1287 


Q9Z2X6JVIOUSE 


1288 


Q9Z2X6JV10USE 


1289 


Q614 91_MOUSE 


1290 


Q9Z2X6__MOUSE 


1291 


Q614 91_MOUSE 


1292 


Q6424 7_MOUSE 


1293 
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1295 
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Q6424 7_MOUSE 


1297 
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FPCPDCGKRFVYKSHLVTHRRIH 
YKCQLCRSAFRYKGNLASHRTVH 
YKCDRCQASFRYKGNLASHKTVH 
YKCDRCQAS FRYKGNLASHKTVH 
FACQDCGRRFNQ STKLI QHQRVH 
- - CVECGERFGRRS VLLQHRRVH 
-DCPVCNKKFKMKHHLTEHMKTH 
- - - HMCDKAFKHKSHLKDHERRH 
HE CG I CRKAFKHKHHL I EHMRLH 
FKCTECGKAFKYKHHLKEHLRIH 
FKCNECGKGFGRRSHLAGHLRLH 
YGCNECGKSFGRHSHLIEHLKRH 

YVCKQCGKAFTLS S SLRRH 

Y VC KE CGKAF T L S T S L YKHLRTH 
HGCDECGKSFTQHSRLIEHKRVH 
YRCNLCGRS FRHS T SLTQHE VTH 
YVCKECGKAFARSTSLHIHEGTH 
YVCKHCGKAYTTYNTLRAHERSH 
YVCKHCGKAYTTYNTLRAHERSH 
YVCKHCGKAYTSYSTLRAHERSH 
YVCKHCGKAYTSYSTLRAHERSH 
YVCKHCGKAYTSYSTLRAHERSH 
YVCKHCGKAFTQSSYLRIHKRTH 
YE CE Q C GKAHGHKHALTDHLR I H 
YECEQCGKAHGHKHALTDHLRIH 
YECNQCGKAFTQFFPLKRHEITH 
YKCDE CGKAFGH S S S LTYHMRTH 
YKCDECGKAFGHS S SLTYHMRTH 
YQCNQCAKAFPYHRTLQIHERTH 
CE YNQCWKAFAYHKTLQ I HERTH 
YE CNQCGKAFACYQ S FQ I HKRTH 
YE CNQ CGKAFACNR YLQ I HKRTH 
YE CNQ C GKAFAC PRYLQ I HKRTH 
YECNQCGKAFACLRNLQNHKTTH 
FECNQCGKAFAHHSTLQRHKRTH 
YECNQCGKAFTRHSTLQIHKRTH 
YE CNQCGKAFTCRSNLQ I HKRTH 
YVCKQCGKAFTRSSHLQIHKITH 
YICKQCGKAFARSSHLQIHKRSH 
YKCKQCGKDFTHHSTLHIHKRIH 
YSCKLCGKAFTHSNYLQIHKRIH 
YE CNQ C GKAF ARNS NLLDHKR I H 
YICKQCGKTFRYLSCFQKHERIH 
YACKQCDKAFKYLSSLQNHKRIH 
HAC KQCGKS F KRQ SNVQ AHERNH 
YTCKHCTKTFTTS STRNSHEKTH 
YACKHCGKAFTTSSARNSHERIH 
YACKHCGKAFTSSSDRNSHERIH 
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Q64247JVIOUSE 1300 
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Q9Z0Q5_MOUSE 1309 
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Q9Z162_MOUSE 1315 
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Q618 98_MOUSE 1317 
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Q9ZlD9_MOUSE 1321 
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Q60585JVLOUSE 1324 
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GFI1JYIOUSE 13 2 8 

Q61624JVIOUSE 1329 

P97475JVIOUSE 1330 

Q61624JYIOUSE 13 31 

P97475_MOUSE 1332 

ZFP1_M0USE 13 3 3 

Q9Z116_MOUSE 1334 

Q9Z116JVTOUSE 13 3 5 

ZFPl_MOUSE 13 3 6 

Q0 6 0 54JVIOUSE 13 3 7 

Q0 6 0 54_MOUSE 133 8 

Q0 6 0 54JVEOUSE 13 3 9 

Q06054_MOUSE 1340 

Q0 6 0 54_MOUSE 1341 

Q0 60 54 JVIOUSE 1342 

Q06054_MOUSE 1343 

Q0 60 54_MOUSE 13 44 

MLZ4_MOUSE 1345 

ZF3 7 MOUSE 134 6 



YPCKYCGKAFATSSDRNSHERIH 
YSCTHCGKAFSSPSDYNSCERIH 
YVCNE CGKAFTC S S YLL I HQR I H 
YMCNHCYKHFSQSSDLIKHQRIH 
YVCKQCGKAFAQS SYLHIHQRSH 
YQ C KD CGKAF S GKG S L I RH YR I H 
YECNKCGKAFSRITSLIVHVRIH 
YECNECGKAFSQRTSLIVHVRIH 
YQ CNVCGKAF KR S T S F I EHHR I H 
YECKI CGKAFCQS S SLTVHMRSH 
YECNVCGKAFSQS S SLTVHVRSH 
YECIDCGKAFSQSSSLIQHERTH 
CQCVICGKAFTQASSLIAHVRQH 
YECKGCGKAFIQKSSLIRHQRSH 
FE C KDCGKAF IQKSNL I RHQRTH 

T YC S KAFRD S YHLRRHQ S CH 

HACEMCGKAFRDVYHLNRHKLSH 
HACEMCGKAFRD VYHLNRHKL S H 
FRCTECDKSFIRSSHLREHQKIH 
FDCKECGKTFSRGYHLTLHQRIH 
YACAECGRRFGQSAALTRHQWAH 
YACTECGKS FRQVAHLTRHQRLN 
YACPECGECFRQSSHLSRHQRTH 
YKC FQCGERFRQ S THLVRHQR I H 
YKCTKCDKLFTQYSHLRRHQRIY 
YKCTECKKAFRQHSHLTYHQRIH 
HKCTE CAKASAAS PHL I QHQRTH 
YECTECSKAFCQKSHLTQHQRVH 
YPCQFCGKRFHQKSDMKKHTYIH 
YPCQYCGKRFHQKSDMKKHTFIH 
FRCDE CGMRF I Q KYHMERHKRTH 
FRCDECGMRFIQKYHMERHKRTH 
FQCSQCDMRFIQKYLLQRHEKIH 
FQCSQCDMRFIQKYLLQRHEKIH 
FVCNYCDKTFSFKSLLVSHKRIH 
Y I C FE CRKAF YRKS ELTDHQR I H 
YECKECGKAFCQKPQLTLHQRIH 
YGCSECGKTFAQKFELTTHQRIH 
YKCSDCGKCFIQKANLRTHQKIH 
YKCSDCGKCF I QKANLRTHERI H 
YKCSDCDKCFIQKAKLKKHQRIH 
YKCSECDKCF I QKDHLRTHQRLH 
YKC SECDKCFI RKANLRRHHRI H 
YKCSE CHKC F I RKAHLRR HQR I H 
YKC S E CHKCF I QQAHLRRHQKI H 
Y I CAE CNKC FIQKS QLKTHQR I H 
HICSQ CGKAF S Q I SDLNRHQKTH 
YE CNE CGI AF S Q KS HLWHQRTH 
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ZF3 8JVIOUSE 13 69 
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YE CNE CGIAFSQKS HL VLHQRTH 
YE C VE CGKAF SQKSHLI VHQRPH 
YECVECGKAFSQKSHLIVHQRTH 
FECNECGKTFSKKSHLVIHQRTH 
FE CNE CGKTFS KKSHLVI HQRTH 
FECKECGKAFHFSSQLNNHKTSH 
FE C YE CGKAFNAKS QLV I HQRSH 
PECYECGKAFNAKSQLVIHQRSH 
YECKI CGKC FYWKT S FNRHQ S TH 
Y S CNE CGKAFRQKS S L TVHQRTH 
YE CAE C GKAF STKS YLT VHQRTH 
YE C S KCGKT F RGKY S LDQHQRVH 
HECADCGKTFLWRTQLTEHQRIH 
YE CM I CGKHFTGRS SLTVHQVIH 
YECDQCGKAFIKNSSLIVHQRIH 
YKC S VCGKAF IQKISLI EHE QIH 
YKCDTCGKAFSQKSSLQVHQRIH 
- -CRMCGKAFKRSSTLSTHLLIH 
-DCKICGKSPKRSSTLSTHLLIH 
HSCGICGKCFTQKSTLHDHLNLH 
YKCE VCGKTFRWRT VL I RHKWH 
- YKCMCGKAF S Q C S AFTLHQR I H 
YKC KE CGKAFNHS SNFNKHHR I H 
YGCNE CGKAF SQFS TLALHMR I H 
YG CNE CGKAF SQFS TLALHLR I H 
YECTECGKTFSQRSTLRLHLRIH 
YKCDE CGKNF S QNS D L VRHRRAH 
YECNECGKAFKYGSSLTKHMRIH 
YE CNE CGKAFKYGS SLTKHMR I H 
YKCHD CGKAF SKNSSL TQHRR I H 
CRDCGKFFSQTSHLNDHRRIHTG 
YKCSTCGKGFSRSSDLNVHCRIH 
YLCQQCGKS FSRS FNL I KHRI IH 
YAC KE C GE S F S YN SNL I RHQR I H 
YRCSI CGARFNRPANLKTHSRIH 
YRCNI CGAQ FNRPANLKTHTR I H 
YRCNICGAQFNRPANLKTHTRIH 
YKCRDCGKSFSRSANLITHQRIH 
YQCLQCNKS FNRRS TLS QHQGVH 
YPCNSCSKSFSRGSDLIKHQRVH 
YPCSWCIKSFSRSSDLIKHQRVH 
YPCNQCTKSFSRLSDLINHQRIH 
YECDVCQKTFSHKANLIKHQRIH 
YE CDKCGKTF S Q S SNL I LHQR I H 
YECNECGKTFTRSSNLIVHQRIH 
YDCNE CGKS FGRS SHL I QHQT I H 
YECTACGKSFSRSSHLITHQKIH 
YECTECGKAF SQS AYL I EHRRI H 
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YACKECGRNFSRSSALTKHHRVH 
YE C TE CDKS F S RS SAL I KHKRVH 
YKCSECGKSFSQSSILIQHRRIH 
YKCSECGNSFSQSAI LNQHRR I H 
HQ CNE CGKS F1Q SAHL I QHRR I H 
YRCQECGMSFGQS SAL I QHRR I H 
YECS QCGKS FSQKSGL I QHQWH 
YE CRE CGKS F SQKATL I KHQRVH 
YECSQC GKS F S QKATL VKHKRVH 
HQ CNE C GRGF S L KS HL S QHQR I H 
YQCSECGKAFSQKSHHIRHQRIH 
YQCSECGKAFSQKSHHIRHQKIH 
YDCSECGKAFSQLSCLIVHQRIH 
YKCS ECGKAFNQ SSVLILHQRIH 
YKCD VCGKAF S Q S S DR I LHQRIH 
FKCNTCGKTFRQS S SRI AHQRIH 
YKCNECGT I FRQKQYL I KHHN I H 
F KCNE C GT AF GQ KKYL I KHQN I H 
FECSQCGRAFSQKQYLIKHQNIH 
FE CNE C GKAF S QKQYV I KHQ S TH 
F KCNE CGKAF S Q KENL I IHQRIH 
FECSDCGKAFSQKENLLTHQKIH 
FKCSECGRAFSQSASLIQHERIH 
FECHECGKAF I Q S ANLWHQR I H 
FTCSECGKGFSQSANLWHQRIH 
FACSDCGKAFTQSANLIVHQRSH 
YKCHE CGKAF S Q SMNLT VHQRTH 
YQ CNE C GKS F S QHAGL SSHQRLH 
YNCNECGKALSSHSTLIIHERIH 
YKCD Q C P KAFNWKS NL I RHQMS H 
YKCDQCPKAFNWKSNLIRHQMSH 
YKCDVCGKSFGWRSNLI IHHRIH 
YACHLCGKAFRVRSHLVQHQSVH 
YKCQVCGKAFRVS SHLVQHHSVH 
YECNDCGKAFVYNSSLATHQETH 
YKCNACGRAFNRRSNLMQHEKIH 
YKCNVCGKAFNRRSNLLQHQKIH 
YVCGKCGKAFTQS SNLTVHQKI H 
YECKECRKAFYDKSNLKRHQKIH 
YECKECRKFFRRYSELISHQGIH 
YECKECGKAFRQCAHLSRHQRIH 
YECI ECGKAF KQNAS LTKHMKI H 
YEC I ECGKAFKQNASLTKHMKIH 
YECNECGKAFKRHRSFVRHQKIH 
FECKDCGKVFRLNIHLIRHQRFH 
YECKECGKAFRLPQQLTRHQKCH 
HRCNE CGKS LSSSS GLQRHQRI H 
HACPECGKTFATSSGLKQHKHIH 
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Q9ZlD8_MOUSE 1476 
Q9Z1D8 JYIOUSE 14 7 7 
ZF3 7 JYIOUSE 14 7 8 
Q62 514JVIOUSE 14 7 9 

08 8 2 82 JYIOUSE 14 8 0 
Q6 10 65 JYIOUSE 14 81 
BCL6 JYIOUSE 14 82 
Q60585JYKDUSE 1483 
Q6 0 5 85 JYIOUSE 14 84 
Q6 05 85 JYIOUSE 14 85 
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OZFJYIOUSE 14 8 7 
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MFG3 JYIOUSE 14 8 9 
Q61849 MOUSE 1490 



HACPECGKTFATSSGLKQHKHIH 
YECGECGKTFTRSSNLVKHQVIH 
FKCSECEKAFSYSSQLARHQKVH 
FECNVCGKAFRHSSSLGQHENAH 
YECNTCGKLFNHRS SL TNHYKIH 
YKCDECGKSFSDGSNFSRHQTTH 
YKCGECGKAFSQRGNFLSHQKQH 
CDVCGKVF S QRSNLLRHQKI HTG 
YECNECAKTFFKKSNLI IHQKIH 
YKCKDCEKAFSCFSHLIVHQRIH 
YKCNECGRAFGQWSALNQHQRLH 
YQCSLCGKAFQRSSSLVQHQRIH 

CGKVFILSGDLIKHERIH 

YECEQCGSAFRLPYQIiTQHQRIH 
FECELCGSAFRCRSQLNKHLRIH 
FKCKLCESAFRRKYQLSEHQRIH 
FKCQE CGKAF WLAYL I EHQS I H 
FVCKQCGEAFVNSSHLISHERIH 
FQCKECGRAFVRSTGLRIHERIH 
FVCKTCGKAFSRSDYLINHKRIH 
FVCKKCGKAFKRLGHFMNHERIH 
FQCKECGKAFSRCSSLVQHERTH 
FECKDCGKAFTVLAQLTRHQTIH 
FHCKVCGKAFTVLAQLTRHENIH 
FECKECGKSFKRVSSLVEHRIIH 
FE C PE CGKAFTHQ SNL I VHQRAH 
FECTECGKAFSRSSNLIEHQRIH 
FECQECGEAFARRSELIEHQKIH 
FRCTECGQSFRQRSNLLQHQRIH 
FACAECGQSFRQRSNLTQHQRIH 
FACPECGQSFRQHANLTQHRRIH 
YACAECGKAFRQRPTLTQHLRTH 
AECGKT FRQRATLTQHL CVHTGE 
FRCEECGKSYNQRVHLIQHHRVH 
FKCGECGKSYNQRVHLTQHQRVH 
FECNQCGKAFKQIEGLTQHQRVH 
FECNQCGKAFKQIEGLTQHQRVH 
YPCPTCGTRFRHLQTLKSHVRIH 
YPCEICGTRFRHLQTLKSHLRIH 
YPCEICGTRFRHLQTLKSHLRIH 
YDCKECGKAFRVRQQLTLHERIH 
YDCKECGKAFRVRGQLMLHQRIH 
YECGECGKAFKVRQQLTFHQRIH 
YACKECGKAFNGKSYLKEHEKIH 
YTCKECGKAFSGKSNLTEHEKIH 
FICKECGKTFSGKSNLTEHEKIH 
YKCKDCGKCFGCKSNLHQHESIH 
YQCKECGKCFRQRSKLTEHES IH 
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1511 
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08 8 631_MOUSE 


1520 


HKCKECGKSFFILSHLKTHYRIH 


Q618 98JVIOUSE 


1521 


YECKECGKSFIELSHLKRHYRIH 


Q9ZlD7_MOUSE 


1522 


HGCDECGKS FTQHSRL I EHKRVH 


035 73 8JVTOUSE 


1523 


FKCADCDRRFSRSDHLALHRRRH 


O89 0 90_MOUSE 


1524 


- -CPECPKRFMRSDHLSKHIKTH 


Q6416 7_MOUSE 


1525 


- -CPECPKRFMRSDHLSKHIKTH 


O8 90 8 7JVIOUSE 


1526 


- -CPECPKRFMRSDHLSKHIKTH 


Q62445_MOUSE 


1527 


- -CPECSKRFMRSDHLSKHVKTH 


O8 90 91JYIOUSE 


1528 


- -CPMCDRRFMRSDHLTKHARRH 


Q615 9 6JVIOUSE 


1529 


- - CPMCDRRFMRSDHLTKHARRH 


BTE1_M0USE 


1530 


- -CPLCEKRFMRSDHLTKHARRH 


Q62 445JVIOUSE 


1531 


FICNWMFCGKRFTRSDELQRHRRTH 


Q64167JYIOUSE 


1532 


FMCNWSYCGKRFTRSDELQRHKRTH 


O8 9 0 90_MOUSE 


1533 


FMCNWSYCGKRFTRSDELQRHKRTH 


O8 908 7_MOUSE 


1534 


FMCNWSYCGKRFTRSDELQRHKRTH 


Q6 0 843JYIOUSE 


1535 


YHCNWEGCGWKFARSDELTRHYRKH 


EZFJVIOUSE 


1536 


YHCDWDGCGWKFARSDELTRHYRKH 


Q60 9 80JVEOUSE 


1537 


YKCTWEGCTWKFARSDELTRHFRKH 


03 5 73 8 MOUSE 


1538 


YKCTWEGCTWKFGRSDELTRHYRKH 
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Q9Z0Z7JV1OUSE 153 9 

O702 61_MOUSE 1540 

EKLF_MOUSE 1541 

Q61596_MOUSE 1542 

O8 90 91JMOUSE 1543 

BTEl_MOUSE 1544 

EGR2_MOUSE 154 5 

WTl_MOUSE 154 6 

WT1JVIOUSE 1547 

EGRIJVTOUSE 154 8 

KR2JVIOIJSE 1549 

O3 57 0 0JVIOUSE 1550 

EVI1_M0USE 1551 

ZF2 9_MOUSE 1552 

ZF3 8JMOUSE ' 1553 

Q9Z1D8JVIOUSE 1554 

ZF2 9_MOUSE 1555 

ZF2 9__MOUSE 155 6 

ZF3 8JYIOUSE 15 57 

ZF2 9__MOUSE 1558 

ZF90_MOUSE 1559 

MLZ4JMOUSE 15 6 0 

ZF2 9_MOUSE 15 61 

MLZ4_MOUSE 15 62 

MLZ4JMOUSE 15 63 

O70162JVTOUSE 1564 

035483JVEOUSE 1565 

035483JYOJSE 1566 

ZFPl_MOUSE 1567 

GFIl_MOUSE 15 68 

O70237JMOUSE 1569 

ZF2 9__MOUSE 15 7 0 

KID1JYIOUSE 15 71 

KID1JMOUSE 15 72 

Z151_MOUSE 1573 

O35700JMOUSE 1574 

EVI 1JYIOUSE 15 75 

Q60585JVIOUSE 1576 

Q9Z116JVOJSE 1577. 

KR2JYIOUSE 157 8 

Q61164JYKDUSE 1579 

P97365JV[OUSE 1580 

KIDl_MOUSE 1581 

ZF3 5_MOUSE 15 82 

ZF3 5_MOUSE 1583 

ZF3 8_MOUSE 1584 

Q9Z1D9JYIOUSE 1585 

Q9Z1D9 MOUSE 1586 



YKCTWEGCDWRFARSDELTRHYRKH 
YACSWDGCDWRFARSDELTRHYRKH 
YACSWDGCDWRFARSDELTRHYRKH 
FSCSWKGCERRFARSDELSRHRRTH 
FSCSWKGCERRFARSDELSRHRRTH 
FPCTWPDCLKKFSRSDELTRHYRTH 
YPCPAEGCDRRFSRSDELTRHIRIH 
YQCDFKDCERRFSRSDQLKRHQRRH 
FQCKTCQRKFSRSDHLKTHTRTH 
FQCRICMRNFSRSDHLTTHIRTH 
YQCNECGKPFSRSTNLTRHQRTH 
YTCRYCGKI F PRS ANLTRHLRTH 
YTCRYCGKI F PRS ANLTRHLRTH 
FQCAECGKSFSRSPNLIAHQRTH 
YVCTKCGKAFSHSSNLTLHYRTH 
YQCD SCGKAF SYSSDLI QHYRTH 
YQCGECGKNFSRS SNLATHRRTH 
YRC PECGKGF SNS SNF I THQRTH 
YICAECGKAFSNSSNLTKHRRTH 
YECLTCGESFSWSSNLIKHQRTH 
YE CNE CGEAF SRLSSL TQHERTH 
YHCNE C GENF SRI SHLVQHQRTH 
YKCLMCGKS F S RGS I LVMHQRAH 
YECEECGKSFSRSS HL AQHQRTH 
YKCYECGKGFSRS SHLI QHQRTH 
F AC PECGQRFSQRL KL TRHQRTH 
FPCPECGKRFSQRSVLVTHQRTH 
- -CDECGKGFVYRSHLAIHQRTH 
YECSECGKSFI QNSQLI IHRRTH 
HKCQVCGKAF SQSSNL I THSRKH 
HKCQVCGKAF S Q S SNL I TH S RKH 
YKCTE CGQKF S QS SAL I THRRTH 
FKCKECSKAFSQSSALIQHQITH 
CKCKVCGKAFRQS S AL I QHQRMH 
YVCERCGKRFVQSSQLANHIRHH 
YE CENCAKVFTDP SNLQRH IRSQH 
YE CENCAKVFTD P SNLQRH I RS QH 
YECKKCAKI FTCS SDLRGHQRSH 
YECTVCRKS F I CKS S FSHHWRTH 
YT CNVCDKHF I ERS S LTVHQRTH 
FQCSLCSYASRDTYKLKRHMRTH 
FQCWLCSAKFKI S SDLKRHMRVH 
YKCSMCEKTFINTSSLRKHEKNH 
YTCNLCSKSFSQSSDLTKHQRVH 
YH C S S CNKAFRQ S S D L I LHHRVH 
YWCSHCGKTFCSKSNLSKHQRVH 
YKCGDCEKSFRQRSDLFKHQRTH 
YKCDSCEKGFRQRSDLFKHQRIH 
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ZF3 5_MOUSE 
ZF3 5_MOUSE 
ZF3 5_MOUSE 
ZF3 5JMOUSE 
ZF3 5JVIOUSE 
ZF3 5JV10USE 
Q9Z1D9JYIOUSE 
Q9Z1D9JYIOUSE 
Q9Z116JVIOUSE 
Q0 6 054_MOUSE 
Q9Z116JVIOUSE 
ZF2 9JVIOUSE 
MLZ4JVEOUSE 
ZF3 7JMOUSE 
Q62 514JVIOUSE 
ZF9 0JV1OUSE 
Q614 91_MOUSE 
ZF3 5JMOUSE 
Q6424 7_MOUSE 
Q61116JYIOUSE 
03 54 83JVIOUSE 
ZF2 9__MOUSE 
Q61117JYIOUSE 
Q9Z2U2_MOUSE 
Q61116JVIOUSE 
Q61117JYIOUSE 
Z2 3 9_MOUSE 
Z2 3 9_MOUSE 
Z23 9_MOUSE 
Z2 3 9JYIOUSE 
ZF3 5JVIOUSE 
ZF3 8_MOUSE 
O3 57 0 0JYIOUSE 
EVIl__MOUSE 
03 54 83JYIOUSE 
03 54 83_MOUSE 

07 0162JVIOUSE 

08 8412JVIOUSE 
08 8 631_MOUSE 
08 8 631JVIOUSE 
Z2 3 9_MOUSE 
Z23 9_MOUSE 
MLZ4JYIOUSE 
MLZ4JMOUSE 
Q61116_MOUSE 
Q61116_MOUSE 
Q62 518JYIOUSE 
Q9Z150 MOUSE 



1587 
1588 
1589 
1590 
1591 
1592 
1593 
1594 
1595 
1596 
1597 
1598 
1599 
1600 
1601 
1602 
1603 
1604 
1605 
1606 
1607 
1608 
1609 
1610 
1611 
1612 
1613 
1614 
1615 
1616 
1617 
1618 
1619 
1620 
1621 
1622 
1623 
1624 
1625 
1626 
1627 
1628 
1629 
1630 
1631 
1632 
1633 
1634 



YPCSQCSKMFSRRSDLVKHYRIH 
YQCSHCSKSFSQHSGMVKHLRIH 
YACTQCPRSFSQKSDLIKHQRIH 
YPCAQCNKSFSQNSDLIKHRRIH 
YMCNHCYKHFSQSSDLIKHQRIH 
YNCDE CDQS F AW STGLI RHQRTH 
YQ CQECGKRF S Q S AAL VKHQRTH 
YACWCGRRFSQSATLIKHQRTH 
YECKQCMKTFYRKSGLTRHQRTH 
YE CKQC S KS F YT S SHLENHYRTH 
YECQLCQKAFYCTSHLIVHQRTH 
YECPQCGKTFSRKSHLITHERTH 
YECVQCGKGFTQSSNLITHQRVH 
YECNHCGKVLSHKQGLLDHQRTH 
YE CNHCGKVL SHKQ GLLDHQRTH 
YE CNE C GRAFRKKTNLHDHQRTH 
YECNQCGRAFRQYVYLQCHERIH 
YPCAQCGKSFSQRSDLVNHQRVH 
YVCEQCGKGFIQLKYLLMHQRSH 
YTCQQCGKGFSQASYFHMHQRVH 
YRCVFCGAGFGRRSYCVTHQRTH 
YRCGDCGKGFSQRS QLWHQRTH 
YRCDI CGKRFRQRS YLHDHHR I H 
FKCWPSCTKTFTRNSNLRAHCQLVH 
YRCDSCGKGFSRSSDLNIHRRVH 
YQ CHAC WKS F CH S S E FNNH I RVH 
YQCYECGKGFSQSSDLRIHLRVH 
FKCDRCGKGFSQSSKLHIHKRVH 
YHCGKCGQGF SQSSKLLI HQRVH 
YKCGE CGKGF SQS SNLH I HRCTH 
YKCDECGKAFSQSSDLMIHQRIH 
YDCKCGKAFGQS SDLLKHQRMH 
YRCKYCDRS FSIS SNLQRHVRNI H 
YRCKYCDRSFSISSNLQRHVRNIH 
YRCVF C GR S F S Q S SALARHQAVH 
YLC SNCGRRF SQS SHLLTHMKTH 
FVCGECGRSFSRSSHLLRHQLTH 
YECAKCGAAFI SNSHLMRHHRTH 
YKCME CDRSYIQYSHL KRHQ KVH 
YKCKECGKSYAYRTGLKRHQKIH 
YE C S KCGKGF SQSS NLH I HQRVH 
YACEECGMSFSQRSNLHIHQRVH 
YECNECWRSFGERSDLIKHQRTH 
YE CHE CGRGF S ERS DL I KHYRVH 
Y E CNE C GKRF SLS GNLD I HQRVH 
YKCGD CGKRF S C S SNLHTHQRVH 
YKCGE CGKS F I C S SNLY I HQRVH 
CPRCGKQFNHS SNLNRHMNVHRG 
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Q61116JVIOUSE 
Q61116_MOUSE 
Q62 518_MOUSE 
Q62518JVIOUSE 
Q618 98JV10USE 
Q618 98_MOUSE 
Q9Z117JVIOUSE 
Q618 98_MOUSE 
08 8 631_MOUSE 
Q618 98_MOUSE 
Q9Z117__MOUSE 
Q618 98_MOUSE 
08 8 631_MOUSE 
Q9Z117JVEOUSE 
Q618 98JYIOUSE 
Q9Z117_MOUSE 
Q618 9 8_MOUSE 
Q9Z117JVIOUSE 
Q0 6 0 54_MOUSE 
Q0 6 0 54_MOUSE 
Q0 6 0 54_MOUSE 
Q0 6 0 54_MOUSE 
Q0 6 0 54_MOUSE 
Q618 98JVIOUSE 
Q618 9 8_MOUSE 
Q61898_MOUSE 
Q9Z117JVIOUSE 
Q618 98JMOUSE 
Q9Z117_MOUSE 
08 8 631_MOUSE 
Q618 98_MOUSE 
Q9Z117JVIOUSE 
08 8 631JVTOUSE 
08 8 631_MOUSE 
Q9Z117JVEOUSE 
Q618 9 8JYIOUSE 
Q9Z117JVIOUSE 
Q618 98_MOUSE 
KIDl_MOUSE 
ZF2 9_MOUSE 
Q9Z117JVTOUSE 
Q618 98__MOUSE 
08 8 631_MOUSE 
Q0 83 7 6_MOUSE 
Q6 0 63 6_MOUSE 
Q61116JVTOUSE 
08 82 82_MOUSE 
Q610 65 MOUSE 



1635 
1636 
1637 
1638 
1639 
1640 
1641 
1642 
1643 
1644 
1645 
1646 
1647 
1648 
1649 
1650 
1651 
1652 
1653 
1654 
1655 
1656 
1657 
1658 
1659 
1660 
1661 
1662 
1663 
1664 
1665 
1666 
1667 
1668 
1669 
1670 
1671 
1672 
1673 
1674 
1675 
1676 
1677 
1678 
1679 
1680 
1681 
1682 



FHC SVCGKNF S RS S HFLDHQR I H 

KCNVCQKQ F S KT SNLQAHQRVH 

YSCDVCGKGFSRSSQLQSHQRVH 

FKCDACGKSFSRSSHLRSHQRVH 

YKCRECDKSFTQRAYLRNHHNRVH 

YKCMECDKSFTHNSNFRTHQRVH 

YKCME CNKS FTQDS HLRTHQRVH 

YKC I ECDKS FTQVSHLRTHQRVH 

YKC S ECDKS FTQ AS QLRTHQRVH 

YKCNECDRSFTHYASLRWHQKTH 

YKCKECDKSFAHCSSFRRHQKTH 

YKCKECDKSFAHYPNFRTHQKIH 

YKCKDCDI FFNHYS SLRRHQKVH 

YKCKDCD ISFIQI SNLRRHQRVH 

YKCRDCD ISFSQI SNLRRHQKLH 

FKCRECDKSFTKCSHLRRHQSVH 

YKCRECDKS F I HS SHLRRHQNVH 

YKCRECDKSFIQRSNLI IHQRVH 

YKCSECEKSFTCGSVLRKHQKIH 

YKCSECEKSFTVGSDLRMHQKIH 

YKCSECEKCFTWSDLRTHQKIH 

YKCSECEKSFTVGSSLRIHQRIH 

YKCECGKSFTVGSDLRKHQKCH 

YKCIECGKSFTNNSYLRTHQKVH 

YRC KE CDKS FHE S ATLREHE KS H 

YRCAECDKSFTRCSYLRAHQKIH 

YRCKECDKSFTECSTLRAHQKIH 

YRCKECDKSFTSCSTLKAHQSIH 

YI CKECGKSFTRCS YLRAHQKIH 

YVCKE CGKS LTTCAI LRAHQKI H 

YECKECGKSFTTCSTLRIHQTIH 

YICKECGKSFTKCSTLQIHQKIH 

YTCKQCGKSFTRGSTLRVHQRIH 

YKCNICDKSFTECSSLKEHRKTH 

YKCE VCDKS FTVNS TLKTHLKIH 

YKCE I CDKS FTTTTTLKTHQKI H 

YKCSVCGKSFTQCTNLKTHQRLH 

YKC S VCDKS FTQCTHLKIHqRRH 

YRCKECGKS FGRRS GLF I HQKVH 

YSCPECGKSFGNRSSLNTHQGIH 

YKCKECGKSFPQLSALKSHQKIH 

YKCKECEKSFVQLSALKSHQKLH 

YKCNDCGKSFSYLSALQSHHKRH 

FVCEMCTKGFTTQAHLKEHLKIH 

FKCQTCNKGFTQLAHLQKHYLVH 

YKCE VCGKGF TQWAHLQAHER I H 

YKCETCGSRFVQVAHLiRAHVLIH 

YKCETCGARFVQVAHLRAHVLIH 
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BCL6JYIOUSE 


1683 


08 8 631JVIOUSE 


1684 


Q61116JVIOUSE 


1685 


Z2 3 9_MOUSE 


1686 


ZF2 9_MOUSE 


1687 


Q62 518_MOUSE 


1688 


Q61117JV10USE 


1689 


Q61117_MOUSE 


1690 


Q61116_MOUSE 


1691 


Q61117JV10USE 


1692 


Q61116_MOUSE 


1693 


Q61116_MOUSE 


1694 


Q61117JVIOUSE 


1695 


Q61117_MOUSE 


1696 


Q61117_MOUSE 


1697 


Q61117_MOUSE 


1698 


Q62518_MOUSE 


1699 


ZF2 9_MOUSE 


1700 


O7 0162_MOUSE 


1701 


KIDl_MOUSE 


1702 


TYYl_MOUSE 


1703 


REXl_MOUSE 


1704 


TYY1JYIOUSE 


1705 


MTFl_MOUSE 


1706 


GLI_MOUSE 


1707 


GLI3JVIOUSE 


1708 


ZIC2_MOUSE 


17 0 9 


ZICl_MOUSE 


1710 


ZIC3JVIOUSE 


1711 


ZIC4_MOUSE 


1712 


GLI_MOUSE 


1713 


GLI3JMOUSE 


1714 


O7 02 3 0_MOUSE 


1715 


MTF1JVIOUSE 


1716 


MTFl_MOUSE 


1717 


O702 3 0JVTOUSE 


1718 


MTFl__MOUSE 


1719 


O7 02 3 0_MOUSE 


1720 


ZIC4JVIOUSE 


1721 


ZIC2_MOUSE 


1722 


ZICl_MOUSE 


1723 


ZIC3_MOUSE 


1724 


O7 02 3 0_MOUSE 


1725 


O7023 0_MOUSE 


1726 


MTFl_MOUSE 


1727 


O7 023 0_MOUSE 


1728 


BTEl_MOUSE 


1729 


Q9Z0Z7 MOUSE 


1730 



YKCETCGARFVQVAHLRAHVL I H 

YRCE VCDKWFTL S S S LSRHQKI H 

YRCEVCGKRFPWSLSLHSHQSVH 

YKCDKCGKGFTRS SSLLVHHSLH 

YKCGLCGKSFSQSSSLIAHQGTH 

YKCVDCGKEFSRPSSLQAHQGIH 

YRCEECGKGFSWSSSLLIHQRAH 

YKCEECGKVFSWSSYLKAHQRVH 

FKCEECGKEFRWS VGLS SHQRVH 

YKCETCGKAFSRVSILQVHQRVH 

YKCEECGKGFS SAS S FQSHQRVH 

YKCGECGKGFSHASSLQAHHSYH 

YQCAE CGRGFTVE S HLQAHQRSH 

YQCEECGRGFCRASNFLAHRGVH 

YKCEECGKGFTRASTLLDHQRGH 

YVCEECGKGFSQASHLLAHQRGH 

YNCETCGSAFSQASHLQDHQRLH 

YRCPECGKGFSWNSVLI IHQRIH 

YCCGECDLGFTQVSRLTEHQRIH 

YRCSECGKGFTSISRLNRHRIIH 

YVCPFDGCNKKFAQSTNLKSHILTH 

YQCTFEGCGKRFSLDFNLRTHIRIH 

FQCTFEGCGKRFSLDFNLRTHVRIH 

YQCTFEGCPRTYSTAGNLRTHQKTH 

HKCTFEGCRKSYSRLENLKTHLRSH 

HKCTFEGCTKAYSRLENLKTHLRSH 

FQCEFEGCDRRFANS SDRKKHMHVH 

FKCEFEGCDRRFANS SDRKKHMHVH 

FKCEFEGCDRRFANS SDRKKHMHVH 

FRCEFEGCERRFANS SDRKKHSHVH 

YMCEQEGCSKAFSNASDRAKHQNRTH 

YVCEHEGCNKAFSNASDRAKHQNRTH 

Y VC T VP GCDKR FTEYSS LYKHHWH 

FECDVQGCEKAFNTLYRLKAHQRLH 

FVCNQEGCGKAFLTSYSLRIHVRVH 

YQCEHSGCGKAFATGYGLKSHFRTH 

FRCDHDGCGKAFAASHHLKTHVRTH 

FKCPIEGCGRSFTTSNIRKVHIRTH 

FPCPFPGCGKVFARSENLKIHKRTH 

FPCPFPGCGKVFARSENLKIHKRTH 

FPCPFPGCGKVFARSENLKIHKRTH 

FPCPFPGCGKI FAR S ENLKI HKRTH 

YYCTEPGCGRAFASATNYKNHVRIH 

YRCS EDNCTKS FKTS GDLQKHI RTH 

FNCESQGCSKYFTTLSDLRKHIRTH 

FRCKYDGCGKLYTTAHHLKVHERSH 

HKC P Y S GCGKVYGKS S HLKAHYRVH 

- - CD YMGCTKVYTKS SHLKAHLRTH 
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Q60 9 8 0_MOUSE 


1731 


HRCD YDGCNKVYTKS SHLKAHRRTH 


03 573 8_MOUSE 


1732 


HRCDFEGCNKVTTKS SHLKAHRRTH 


Q615 9 6JVIOUSE 


1733 


HI CSHPGVGKTYFKSSHLKAHVRTH 


O8 9 0 91_MOUSE 


1734 


HICSHPGCGKTYFKSSHLKAHVRTH 


Q6 0 843_MOUSE 


1735 


HTCS YTNCGKTYTKS SHLKAHLRTH 


EZF_MOUSE 


1736 


HTCDYAGCGKTYTKSSHLKAHLRTH 


Q64167JVIOUSE 


1737 


HICHIQGCGKVYGKTSHLRAHLRWH 


O8 90 90_MOUSE 


1738 


H I CH I QGCGKVYGKT S HLRAHLRWH 


O890 8 7_MOUSE 


1739 


HI CHI QGCGKVYGKTSHLRAHLRWH 


Q6244 5JVIOUSE 


1740 


HVCH I EGCGKVYGKT SHLRAHLRWH 


O7 02 61JVIOUSE 


1741 


■ HTCGHEGCGKSYTKS SHLKAHLRTH 


EKLF_MOUSE 


1742 


HTCGHEGCGKSYSKS SHLKAHLRTH 


WTl_MOUSE 


1743 


FMCAYPGCNKRYFKLSHLQMHSRKH 


ZEPl__MOUSE 


1744 


YICEYCNRACAKPSVLLKHIRSH 


Q614 7 9_MOUSE 


1745 


YICQYCSRPCAKPSVLQKHIRSH 


O5514 0JVIOUSE 


1746 


YICPYCSRACAKPSVLKKHIRSH 


Q60 6 3 6JVLOUSE 


1747 


HECQVCHKRFSSTSNLKTHLRLH 


SNAI_MOUSE 


1748 


CVCTTCGKAFSRPWLLQGHVRTH 


P974 69JVIOUSE 


1749 


CVCKICGKAFSRPWLLQGHIRTH 


ZIC2_MOUSE 


1750 


HVCFWEECPREGKPFKAKYKLVNHIRVH 


ZIC3_MOUSE 


1751 


HVCYWE E C PRE GKS FKAKYKLVNH I RVH 


Q62 0 65__MOUSE 


1752 


HECKLCGASFRTKGSLIRHHRRH 


Q62 0 65_MOUSE 


1753 


HVCQ F C S RGFRE KG S LVRHVRHH 


IKARJVTOUSE 


1754 


FQCNQCGASFTQKGNLLRHIKLH 


Q9Z2Z2_MOUSE 


1755 


FHCNQCGASFTQKGNLLRHIKLH 


HELIJVIOUSE 


1756 


FHCNQCGASFTQKGNLLRHIKLH 


Q61164JVIOUSE 


1757 


HKCHLCGRAFRTVTLLRNHLNTH 


Q6.1624__MOUSE ' 


1758 


HVCEHCNAAFRTNYHLQRHVFIH 


P974 75_MOUSE 


1759 


HVCEHCNAAFRTNYHLQRHVFIH 


Z151_MOUSE 


1760 


YVCTHCQRQFADPGGLQRHVRIH 


Q62 511_MOUSE 


1761 


Y I CEYCARAFKS SHNLAVHRM I H 


MAZ_MOUSE 


1762 


YI CALCAKEFKNGYNLRRHEAIH 


08893 9JYTOUSE 


1763 


YECNT CKVRFTRQDKLKVHMRKH 


Q64 3 21_MOUSE 


1764 


- -CEVCGVRFTRNDKLKIHMRKH 


P973 65_MOUSE 


1765 


PHKCEVCGKCFSRKDKLKTHMRCH 


08 8 93 9_MOUSE 


1766 


YLCQQ CGAAFAHMYDL KNHMRVH 


Q643 21_MOUSE 


1767 


YSCPHCPARFLHSYDLKNHMHLH 


Z151_MOUSE 


1768 


HKCEDCGKEFTHTGNFKRHIRIH 


Z151JMOUSE 


1769 


YRCGDCGKLFTTSGNLKRHQLVH 


Z151 MOUSE 


1770 


- KCRECGKQFTTSGNLKRHLRIH 



Chicken database. 



5 35 finger units SEQ ID NO 

Q92010 CHICK 1771 



YSCEVCGKS F I RAPDLKKHERVH 
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Q90851_CHICK 


1772 


YPCTICGKKFTQRGTMTRHMRSH 


Q9 085 0_CHICK 


1773 


YPCTICGKKFTQRGTMTRHMRSH 


Q90851_CHICK 


1774 


- - CDACGMRFTRQYRLTEHMRIH 


Q9 0 85 0_CHICK 


1775 


- -CDACGMRFTRQYRLTEHMRIH 


CTCF CHICK 


1776 


HKCPD CDMAFVT S GE LVRHRRYKH 


ZKR1~CHICK 


1777 


- TCGDCGKGFAWASHLQRHRRVH 


ZKR1_CHICK 


1778 


HRCGDCGKGFAWASHLQRHRRVH 


ZKR1_CHICK 


1779 


HRCGDCGKGFVWASHLERHRRVH 


ZKR1_CHICK 


1780 


- - CPDCGKSFPWASHLERHRRVH 


Q92010_CHICK 


1781 


- -CHMCDKAFKHKSHLKDHERRH 


O42 4 0 8_CHICK 


1782 


HECGICKKAFKHKHHLIEHMRLH 


DEFI_CHICK 


1783 


HE CG I CKKAFKHKHHL I EHMRLH 


O42 4 0 8__CHICK 


1784 


FKCTECGKAFKYKHHLKEHLRIH 


DEFI__CHICK 


1785 


FKCTECGKAFKYKHHLKEHLRIH 


O424 0 9_CHICK 


1786 


YPCQYCGKRFHQKSDMKKHTYIH 


O424 0 9_CHICK 


1787 


FECKMCGKTFKRSSTLSTHLLIH 


ZKR1__CHICK 


17 8 8 


YECPECGEAFSQGSHLTKHRRSH 


ZKR1_CHICK 


1789 


YECPECGEAFSQGSHLTKHRRSH 


ZKR1_CHICK 


1790 


YSCPECGESYSQSSHLVQHRRTH 


O424 0 9_CHICK 


1791 


HKCQVCGKAFS QS SNL I THSRKH 


057415_CHICK 


1792 


YQ CNI CD Y I AADKAAL I RHLRTH 


CTCF_CHICK 


1793 


FQCSLCSYASRDTYKLKRHMRTH 


Q57415_CHICK 


1794 


YKCQTCERTFTLKHSLVRHQRIH 


Q92 010_CHICK 


1795 


FVCEMCTKGFTTQAHLKEHLKIH 


057415_CHICK 


1796 


- TCPYCPRVFSWAS SLQRHMLTH 


057415_CHICK 


1797 


HS CS I CGKSLS SAS S LDRHMLVH 


057415_CHICK 


1798 


- -CTVCNKRFWSLQDLTRHMRSH 


Q910 51_CHICK 


1799 


CVCKI CGKAF SRPWLLQGH I RTH 


012939_CHICK 


1800 


C VC KMCGKAF SRPWLLQGH I RTH 


057415_CHICK 


1801 


YKC S VCGQS FTTNGNMHRHMKI H 


I KARACHI CK 


1802 


FQCNQCGAS FTQKGNLLRHI KLH 


CTCF_CHICK 


1803 


HKCHLCGRAFRTVTLLRNHLNTH 


Q93567_CHICK 


1804 


YECNI CNVRF TRQDKLKVHMRKH 


093567 CHICK 


1805 


YLCQQCGAAFAHNYDLKNHMRVH 
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022 08 9 PETHY 


1806 


HECSI CGEQFLLGQALGGHMRKH 


022 0 88 PETHY 


1807 


HECSFCGEDFPTGQALGGHMRKH 


022 087 PETHY 


1808 


- ECS FCGEDFPTGQALGGHMRKH 


Q3 90 92 ARATH 

^P*_ ^a* ^a" ^aj >«A »ab ^ J ■ a 1 - «a» 


1809 


HKCKLCWKS FAWGRALGGHMRSH 


Q3 9217 ARATH 


1810 


HKCSICSQSFGTGQALGGHMRRH 


P93 713 PETHY 


1811 


HECSI CGLE FAI GOAL GGHMRRH 


022 0 86 PETHY 


1812 


HECSICGLEFPIGQALGGHMRRH 


O2 2 0 85_PETHY 


1813 


HE C S I CGME F S LGQALGGHMRRH 


022 084 PETHY 


1814 


HECSICGMEFSLGQALGGHMRRH 


Q42453 ARATH 


1815 


HPCPI CGVKFPMGQALGGHMRRH 


Q42410 ARATH 


1816 


HPCPICGVEFPMGQALGGHMRRH 


06515 0 TOBAC 


1817 


HVC S I CHKAFPTGQALGGHKRRH 


Q4 0 897 PETHY 

J. »— * %a» »*a- a ala J ■»< >** Jb aaa 


1818 


HVCS I CHKAFPTGQALGGHKRRH 


04 0 8 96 PETHY 

•*» aa*^ ^J 1 >L* 1 1 .JU ab Jb aaa 


1819 


HVCS I CHKAFPSGQALGGHKRRH 


Q4 2 4 3 0 WHEAT 


182 0 


HRCSICQKEFPTGQALGGHKRKH 


04 0 899 PETHY 

<4J Vaf ^* aa' .JU JL— 1 *JU JL JL -la 


1821 


HECSICHKCFPTGOALGGHKRCH 

■L-VaW *^»* Km/ «JL* ^a*« aVaVaV^ >w* JU Jb> aA> ^Kl *^J aV ak VaL ^ ^aa* Jb Jb 


P93166 SOYBN 


1822 


HECSI CHKS FPTGOALGGHKRCH 


0962 89 ARATH 

^Jj ^» ■In! VmS JL .b .1. V-i A wba al» ab 


1823 


HVCTICNKSFPSGOALGGHKRCH 

^b V ■ <* kla M^ba Nri'Al ' ^bft ^b> Pbri> r A^JNrf ^ * -* * * *■ J I^I U II 


04242 3 ARATH 


1824 


HVCTICNKSFPSGOALGGHKRCH 

^b «ta V ^*a^ M^ba ^b^*A* ■ >k ^- a- -a j_ ia* T~^* J^Z. AAaJ Ja« al. bi, \» ^- J ^b 


02 2533 ARATH 

^ba** P af abs# aa^ *a*r Jb *- ^ai ^b li* ^b -** 


1825 


HVCS I CHKS FATGQALGGHKRCH 


Q4 0 898 PETHY 

^— ^ aJb *^ »a«^ »wj» JU Jb— T "J- ab ab "-^ 


1826 


HE C S I CHKC F S S GO AL GGHKRRH 


Q3 8 8 95 ARATH 


1827 


YTCSFCKREFRSAOALGGHMNVH 


02 3 621 ARATH 

^ar *«bf *- ab^a aa— JL . 1 ' - * HBaB ^b Jb 


1828 


YTCNFCRREFRSAOALGGHMNVH 

a^v ^a^a* L * *J - " - ^b^ ^a ^a ab ■* <Jai^ - ± - aW ^ ab Jb wai«^H^ va* ab ab ai 4 Xr 1# V 4V *W 


08 0 942 ARATH 


1829 


YTCS FCRRE FKS AQALGGHMNVH 


P93 714 PETHY 

hi »a* a_J / ■ !■ Jb 4* 1—1 *i* JL JL *aW 


1830 


HECSYCGAEFTSGOALGGHMRRH 

■4a Ja .1 al ^S_a« aaW ^aa^ NaJ ala aaaa aai 1 ill ^ 1 ^ * Jb JbA X JV ^ JU Va Jb ala 


Q4 3 614 PETHY 

^«*^ iJa « r Va# i > ■ J_ il-l *L> ab aJb. 


1831 


HE CAI CGAE F T S GO ALGGHMRRH 

ab ^b aHa» f ^^a* ab ab *i ~ Sa* al* AiaJ ■ abt ^ a OV«a^ ^ — ak ab A, Jj i % k M , 1 1 ^_ — L- tb 


022 083 PETHY 

^fcal fill J M |M ■ ^b "a a-»a 


1832 


HECS I CGAE FTS GQALGGHMRRH 


O4107 0 PEA 


1833 


HECS ICGAEFTSGOALGGHMRRH 


Q4 2 3 7 5 ARATH 


1834 


HECSICGSEFTSGQALGGHMRRH 


065499 ARATH 


1835 


HKCNICFRVFSSGOALGGHMRCH 

^b abi aW \a -* * V pla *«a» ab Jb X* V ^La Pvj^p PaX NJ Saf Ja Jbab^ ^b ab J_ aV ala ^^a^ Jb 


022 090 PETHY 


1836 


HECPVCFRVFS SGOALGGHKRTH 

J- Jb Jb>JI JL. W *a— t afc. Jb \, V Ja. ba— ' * ' ^— ^ JJ Jb 1 1 "wi Jb Jb JJ. V- *aVa J. 


02 2 0 82 PETHY 


1837 


HECPVCYRVFSSGQALGGHKRSH 

^b aV<a^ ^v^a V ^a* I ■ ■ >A> » V ^a* P— * ^a a* Jg-y ab ^^k^ ^a^a -a- *■ li Xa ka« r ^b ^b 


P93717 PETHY 


1838 


HECSICHRVFSTGQALGGHKRCH 


O04177 BRARA 


1839 


HTCSICFKSFSS GQALGGHKRCH 

^b ^b haV^ ~^«aa v^va* aalv ^aF* iaka ^b — ■ bar tmm ^a^a* ab ia<bJ »^ ^» aap «Jb 1 i Va ^ta* aaa Jb 


004176 BRARA 


1840 


HTCSICFKSFSSGQALGGHKRCH 

"aaa ^a^a< ^a^ baa ^^a ski K |^a* * * # a^a» taa< a>> 4 ^a a* ^^a* ab ata iAa ba ^^a* Jb ab 


P93 715 PETHY 


1841 


HQCSICHRVFSSGQALGGHKRCH 

aa a> ^pa^ ^ ^ a a ^ a aa am *" 1» ib ^ ■ * aJa4 >a* 4j dab i ■ b ^ ■ ^ aab ^b 


03 9 0 92 ARATH 


1842 


HECPICAKVFTSGOALGGHKRSH 

^ a*a Jba^ Va* ab> ^a ' J- V JL KJ A JJJbjJ »aj > * ^ JV -Ja V * ^ IWaa* Jb Jb 


P93 719~PETHY 


1843 


HECPYCDRVFKSGOALGGHKRSH 

-b *> l * a_ ab ^aa* ■■ ^ a W W aba ^b b I^J 1 " » ab Jb »a .a. r^a** Jb Jb 


P93 718 PETHY 


1844 


HACPFCPRMFKSGOALGGHKRSH 

a» Jb 1, ^> ^J ^La ^a^a" ab aV aba Jb a r*aa^ J>*^ Jb «MaJ ^aa^ Jb a ab ^ Ja V* T i .1 Jb a*a 


02 2 0 91 PETHY 


1845 


YECPLCFKIFOSGOALGGHKRSH 

*4a 1 1 ^_ al — 1 ala ala V. ajba Jb> *»/ *-a^ *baj> ^J. a>J_J V_J V_ J JJ JU J. VI ^ *— > Jb Jb 


04 243 0 WHEAT 


1846 

-J— JL. VJ* 


-KCSVCGKSFSSYOALGGHKTSH 

■la baa» V Naa* W Jb V kj JU *aa# fcaa** aJb Jb.»-1 **a«J \J ala Jb Jb X. ab. Jb Jb 


004177 BRARA 


1847 


YKCTVCGKS FS S YOALGGHKTSH 

ab. ^ b\aJ J- V »a__- \J XVW JU laV •a—' a-V> J+X J— 1 V-J >— • .4-. |a_a* X Ja 


O04176~BRARA 


1848 


YKCTVCGKS FS S YQ AL GGHKTSH 


Q9 62 8 9_ARATH 


1849 


YKCSVCDKTFSSYQALGGHKASH 


Q42 42 3_ARATH 


1850 


YKCS VCDKTFS S YQALGGHKASH 


Q4 0 8 97_PETHY 


1851 


YKCS VCDKS F S S YQALGGHKASH 


Q4 0 896_PETHY 


1852 


YKCS VCDKSFSS YQALGGHKASH 


Q4 0 8 98_PETHY 


1853 


YKCNVCNKS FHS YQALGGHKASH 


O6515 0__TOBAC 


1854 


YKCSVCDKAFSS YQALGGHKASH 


P93166_SOYBN 


1855 


YKCS VCDKS FPS YQALGGHKASH 


Q4 0 8 99_PETHY 


1856 


YKCSVCGKGFGS YQALGGHKASH 


02 25 33_ARATH 


1857 


YKCSVCDKAFSS YQALGGHKASH 
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C\ Q VTTtZA 1 1 Q1 


1 QCO 
X o 3 o 


VTH'D'Pf niSJQ T T?"n r TQPiT< r T?a T^TJMQ QTT 
X X Ir xvk_l\! o X r U ± o^xvl? i^HXll v lo oxl 


09^(^91 /AC\-(Z.O 
UZ j o / ± / ^u oz 


X O 3 _7 


VTP1\TT?PPPTh , 'R 1 P q z\n AT .nnUMTvn7TT 
x XvwXMr v^.rCx\.riii? Jxoi^v^i^Xj^kjrll v liM Vri 




X O D U 


T-T'prr 1 T-TT . n Q Q TP PTCrriT? 2\ T .r^r^tT-JTA/rTTQT-T 




X O D X 


w lL.oL.uoJJr xvxllvxvo l^iSAJ ±1 V r\J-i.Jr LjiMv^rrl 




1 P. O 

x o b z 


Jr iXlYlCvjTXV^iJ? x-i V ISXzLJ Mix. X rlJzi ivLM v_ 


0.0 o c; "3 "3 / q q _ i i i 
V>>Z z j j J / a b? x x x 


x o b -j 


X i\A_o V LJJAiir o o X ^^X]^^JtXr\_r\orl 




X o b ft 


XJTr rp Q t r^TJTTG TP A TPPiA T .r^fHTTTT'D r^TJ 
rlVLol L^ilxVO-t 1 ^^/\XjL?^MAKv^-ti 


PkQGTsTQA / 1 A Q — 1 "7 1 


1 q ir 

X O b 3 


TT"Mr^Q T HTPT^Q Tr"DQr^.07\ T .ntrHTTTTP f^TJ 


O Q QTS.TO A / Q /I "1 1 £T 

b? oi-M ^^t / Z? ft X X b 


X o b b 




HQCTTT / l "1 *7 1 /I n 

yybii / /XX / ~ x^t u 


1 Q cz H 
lob / 


X r L^Lj V L-XJJKK.I? X 1 1M Ji i\.Xj X 1\! rlr rv^X xl 


P, Q C TTX/l"} / l O 1 *3 O H 


1 P (Z. Q 

X o b o 


T .T^T 1 "DTATTTr 1 r 1 Tf WIT TP TTTaTATaT Q TP TTTTT T TPA 7XT 


yybi i v Jj / iz4o-izoo 


1 P £T Q 

X o b b? 


X y v^1\11 v 111iLtL. 1 l v lo r o o Ji J\.^Xj1 v 1Xji11SX<.1\ X 


Ann T'TV/TO / 1 O "7 1 T OQO. 

yybiNj/ xz / x — x<4j7vj 


10*70 
X o / U 


Trr i r >i r 1 TrTCTTPT?GTTVvT ^roTJOTPTnu 


^ b? O 1 jyio / 1 JZQ - lOOZ 


X O / X 


X V L,/\JiiriJL-vj , v»J JL r JXi? V bur o Kxj.ivK.iv 1 Vj-rl 


y^bl l v l3 / lZi?D~iJZU 


lo / Z 


T Trr^ -dTaT Trr* r 1 Tf MT 1 TP TTTaT ATaTGTPT'TPTTT'DT 7TJ 


UoXoUX / Xffc z - Xb4fc 


1 o / o 


DMPT\"n Tr^r* xrr* tp a gtaTTTax/'tp^tjt pau 


UuDOUl/ OX"~OJ 


X o / ft 


r^T^(^T7T^r^GT?T?T?P 1 Q T3^77vTTrT?T?TT7\T"PTA/TTT 


yybr xb/ XXo-X^fcU 


1 O / 3 


VXTr^ 1 GA7P 1 TAT< rr T , Tr G CVPAT.PPTUT^Ti GTJ 

X r\A_.o V UJJxvl r oo X y/\JjvjLjrliS^oxl 


yi?br ib/1 / 4fc - X b? b 


1 Q.H CZ 

lo / o 


rlVLl 1 Ivo V ir O Lrr^/\ljLjVD-JtliVKL^rl 


UbDzftO/X^fc / — X / X 


1 P *7 *7 
X O / / 


TPVT^TPT .r^GT^OVT?T , T7T^T?TTT?r i TTT >G GVTATJ 
r X L^rLXjL-oxvs^ XKJL V l v lriiJr Jiorlljoo XJJn 




T Q 1 P 
X O / o 


TP G r 1 7\TVP 1 O.T? TTT7VG GO A T .r^P'TJO'KTA TJ 
-F o v.l\l X LyKJ\r X o ov^/\JjLarvjxXyi\li-irl 


HQCCTaTH /1 1 q t a n 

yybbvvU / 1 io - l^t u 


io / y 


■ljt re* q t rp o- TT G T? A T 7 f2 Pi AT. CC^IXWV? TT 
nVLo V v- vjxvo J? i-il ^^i^ljvj^rlrvKv^rl 


i^i? o o VV U / / O .7 / 


X O o U 


VTT n P^ 7r* VT^T'TTGGVO A T.^tr^TTTT A GTT 


1^3 b? ZD Z / b X O O 


1 OQ1 
X O O X 


TTG nT\TVr 1 OT? r P"PVG GPi A T .^fVlUOlvTATT 
ir oLiM x L-^xv J. ir x oo^/±ijLTij7irx^x\Jjf^Jri 


OQCGTaTI / 1 £T.A 1 Q £C 
yybbl/vl / XD4t — XOD 


1 Q Q O 
looz 


umpQ Tr*T?TTGTPA QPPAT PT^TTTf T? r^TT 
nlLolLr xvo Jf /\o Lzr^/lijLjvzrJrlx\.Kv>rl 


HQCCTaTI / Q "7 _ 1 1 Q 


1 Q P ^ 
X o o ^5 


VTTP'TA 7 n PI! TTG TP G QVPi A T .f^fXP'TfT 1 GTT 
X IvL^ JL V LoJVor OO X ^jBuXiv^L^xXxvX oxl 


HQ 7DTTI /l ACT ~\ CZH 
\lz> Ziir 1 U / X4t 3 - X b / 


T Q Q A 
lOO^i 


TaTa/pt^t? p g TTP i 'v a xrr^ G"nv"Tr a ttt .tttt 1 

W V L-iiKL-. o xvLj I/iVyaU X x\J-irll_iA.± v_ 


nQ7D r rn / tz h — q q 
yyzjiriu/o / o b? 


1 O Q C 

X o o o 


X XL^11iXv-.x\J^^jx7 v^KiJ^xM xj^i v ixlKKKrl 




T Q Q £T 

X o o b 


TJC r^T^rT^TP^ZTP CTP"K7TP OPT TPTTr^FVMY"" 1 
xlo v_.UL-VjiK V J? o^<.VriJOx , 1 xIjXI^IJINIv-. 




10 0*7 

X o o / 


TP G PTVTVPnP TTTP V G G C\ A T .^2 PTJPkAT A TJ 


yyb o jj x/^yx-j5Xo 


T Q Q Q 

looo 


lAllLl L.vjroJJr KJriKKo JjxvJJxlX Ko ir ioovjiri 


OOOr^T^I / "~> eZ tZ O Q *7 

y b? ovxDX / Z b 3 - z o / 


1 Q O Q 

X o o b? 


17CPPT/PPOT 7\ t rtrp»T^T A 7"p r PTJT?TnsTP 1 


W o v^xJ X / lou-zuz 


7 Q o n 

X O Z7 U 


ir AC oXCorvXrixixvX iMlMMy iYirli v i IA/ vorxl 


HQGCTA79 / "1 H £C 1 OD 

yyobi/vz / lUb-lzo 


1 Q Q 1 

X o b? X 


VTTr^KnJPTP XT A TJDCVP7\ T .PPXJXT A GTT 
X ivLXN V L-xTjlAjB-ir ir o X s^i^XjLT^xXiSAorl 


PiQOCTa79 / l /rc 1 Q T 

SJb^ooWZ / X b 3 - X o / 


1 Q Q O 


T4TTPCT r^XITfK /TPTDTT^OAT r^OUVD (^TJ 

iririiV-o X Uirlrvv Jf ir 1 ^^/\ljvjvjrlJtvKL-rl 


oo £T a / cz n qo 


1 Q Q Q 

lobj 


T4"n i PPiVPPir , n i 'n , 7\\TCP7\ t ppuhata tt 
irLjZiL-s,^ X L-LjxvJixr /\x>J byALbljnyi\l/irl 


DOT Q1 c / ^ "3 n 


1 Q Q /I 

x o y 4t 


r\T?r~ i 7\\Tr i 'W"D\T'n i T g ctjpt t gtjvtxta atj 
yHjC/W L-xVKV Jf iJOoJtiyXjX oJtlxxSl/\/lrl 




1 QQC 

x o y o 


VtrPnVPPDT'TrTVATCnAT ppuama tt 


yjyzbb/ by-ol 


1 Q 

X o b? b 


JT o L.1M X L-KK-xVir X o o^/\ijvj^iriyi\ljfyri 


Z O / / X X 3 


1 pQ7 
X o / 


■pTPPTTVPTPP'Kr'P PT G n AT .f^nTTn"M A TT 

XT X_j v—XX X v_> X? XVLM XT XT X O V^/^XJLJVjXxV^XNl^i.Xl 


Q9SVY1/3 01-323 


1898 


FMCRKCGKAFAVRGDWRTHEKNC 


Q9SVY1/217-239 


1899 


F S C P VC FKTFNRY1STNMQMHMWGH 


Q9SGH2/18 04-182 7 


1900 


IHCLI CHKTFASDDEFEDHTE S KC 


Q38895/47-69 


1901 


YTCSFCKREFRSAQALGGHMNVH 


Q9SLB8/49-71 


1902 


YT C S F CRRE FRS AQ ALGGHMNVH 


Q9SL35/188-210 


1903 


HECSICGSEFTSGQALGGHMRRH 


Q9SL35/113-135 


1904 


YECKTCNRTFSSFQALGGHRASH 
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VJoXUXZ>/t:_7 / X 


1 qnR 
x 23 u o 


JClJ? v^. V X L-Hj xVv^I? o O ^rxVc-i X vjj vrrix Vxll Szl 


UOIUIJ/ X X ~ X *± X 


7 Q P) ^ 

X 23 VJ D 




097^ qc; / cz cz a _ ^ p cz 

\J Zi Zj Zj 23 1 O / DD^i DOD 


X 23 U / 


Juriv_iij j^v^vjTx\-c\J_is>z- t ^ x izji v iJirs_n.XjJt\. v sz szx 


K^J 23 iZD J~ 23 / / J 'i J D 


X _7 U O 




OQcjTq7 / r 7Q_i nn 

\ > J23i22>A.23// / O 1UU 


X 23 \J 23 


uynpT 171^ Z\Vf^nZ\T ,(^{^ WIN/FT? TfW 


nQQp p A / p c _ c: *7 


1 Q1 0 
X _7 X U 


V\ 7"P Q P T T? r^TT 1 QTvT2\ n A T .rtir^TTM>J T "P* 

X VLor ^1 JA-L^r 1 O ±M ^-i.v^j^XJ O- ^ JTli v llM X XZL 


04. 9 4.QC; / £ ft - Q D 


i qi i 

J- 23 J L 


*PC!P'KTVPnPK"FrVC' QO ATintnTTO'KTA'H' 


\J O Zj O 23 / X O X *± _7 


1 QI 9 


17PPKTQPrtT7T"PPTrT'M'T.T,'P , 'KrPrT A TTTTT 
I? ir O V_^JZt X ±Z irxN-XlNJ.I U 1 PilM JTlXj-xX JVlI 




l i j 


VnPT^T'PT^PT'PPCl'FrnAT.nf^TTT? A QT-T 

X VdV-.x\.X L-JJJa. 1 r r Dr V^^^XJOrorxj.x\_t\kD XT 


P QQPiY P / O (C 1 _ O p P 
^ A. O / O X Zj O 3 


X 23 X ~r 


"PTT7PPT nn A Th , T7TC!PO AT .(inWMPPy 


UO J^i?i?/ zi Z. Zi zl *± ^± 


i qi 

X 23 X 3 


wptptjt PTrp\7"pr q QnnaT.nnwMP nw 




1 QI C 

X X o 


P DnTT?nnD T-TTTTATQTATTr A T .T^n^TMP PTT 


n^CIAQQ / l f^Q T ft A 
UuD^tj'i'/ X O Z. — X O r± 


1 QI 7 
X 23 X / 


T?T7nnpnTTTr'\7T?pQ wn at .nniTP a qtt 


AQG piWf/l / o 9 n 9 A P 


1 QI O 


*R'X7 , pPTrnQpnT7PPip^7'nT ,t .~k~tzt t "nT<r'nTT 

U V ^ Jr JX^orCLji? JXJ_> ir VXJXJXJi^XXXiJr^XJxl 


\^Z3 KZ> AOZ3 / OU"lUZ 


X 23 X J7 


X r\.v_ o v ^_XJ JMruul v^i^XJvjjVjjrii\i-i.o szl 


y!70<£;o!7/ X 3 o — X 3 o 


1 QOA 
J-23 A\J 


TT-r rpmj niSTTf QTTPQPnAT .PP,"P"TTP nTT 


^J70L.SJ.O/ -L 3 J? — J. D X 


1 Q 9 1 


Ta7 "k" n "Pi TTn QTrT^V A "\7n Q"nTA7T<r aptqt^t n 


O Qqnn^ / l CZCZ X ~.'\ ft "7 


1 Q 9 9 


VT^PTlPnTT .TVCppnc'c'TT'H'P A T?n 
X i\.V_iJV_Vcr X XJJT oXXJa-XJO i? X X XlJXi-iir V_ 


n qqnn^ / £: p _ p c 
y^oLyD/ 03 03 


"1 Q 9 p 
X 23 A 3 


i? v v_rii x \^i\]r\AjT.r ^xxj^^iNXj^xjJCuiXiXkjri 


PQQPQ-1 /nn — QO 
yi/DJ: Dl/ t \J 23 A 


1 Q 9 A 




P Q G P Q 1 /1 7fl 


1 QQC 

X 23 A ZD 


Inl X Xli Xv ^ o ivvu X jtrX V ^DlJI XV/^XTXjXVX v_- 


PQQPQI /l 7c;„i Q £ 

.7 3 X? O X / X / 3 ~ XZ3 0 


1 Q 9 £ 
X Zi d 


PT Q PF)PritP'\7P QP"\7*P Q P T PPTP'nT'P 


noaqaf; /cr^R-i^Qft 

_7 O Oil O / 3 / 3 — J70 


1 Q97 
1-/Z / 


X XT-^XJ X ^-rXxV X X? x-JlOXJXJX1iX7 XZjXJXI X Xli Di\L» 


v^rr Zx*± A.\J / Zj 23 S. 


X 23 -Z o 


PTPTfTPT •TTnPTTQPnAT.PPl'P'P A QPT 
x? x V-.JA.X ^xjXvv^x? no x? ^i^xjovjjrixxi^o rl 


PA 9 A 7 PI / ft 9 — 1 OA 
vj. *± .«£ ~fc X U / O .Zi X U ,£ ± 


1 Q9 Q 

X 23 Zi J7 


TJ p P p T PP^ 7"P TP PMPrPi A T , PfTtTTMP P TT 
XrLXT^XrXVwV-^VXZjX? Xr l v l^V^je^XJ^VZTXll v lXXXXrr 


npypp^ / i 9 _ p c; 

LJ j?yVJ? Jr O / 1Z J) J 


1 Q "3 n 

X _7 3 U 


"\7Ta7P VVPTlP PP*n*nP"PTT T A/PPTPTf A TTPT 

V WV_* X X k-XJX\.X_i IT XJX-/XL XvX XJ V V^Jrl^XVt^XAXT 


riQYT7P^ / p CZ _ CQ 
K^Jz/J^r Xr o / 3 O 3 


X 23 25 X 


X7 XvV_XX V Vw-XlXVXvXJO Xi-iO Vj1 v 1 V X XZl V XjV^J V XT 


0999^fl /91 fl_9A1 


X 23 3 Zi 


\/'QPP^Pl^]^T I PKr^P'NTAT»PC''P"l\TTrA TTPT 


nA9A^"3 /AH — £T 9 


1 Q P P 
X -3 o 


PPPTfTPT ,TfPP Q QT?n AT .PPPTP A QTT 
X? X\.v_.Xv X V_XjJXX1iX7 Dor ^i^XJ^L^riJXi^OJcT 


Lv^r zi *± 3 3 / OD XUO 


1 Q ^ A 
X J? O L x 


TJDPD T HnXTT? PPMPPAT.PPtPnVTPPTT 
XTXt^-XtXv^,VjjV XvX? Xt l v iLj^i^iJVjTkjjXrLl v lXXXXJrl 


OA 9"37c: /1 1 "3 ipc: 


1 QOC 


VPPT^TPTvTP TP Q Q PP A T .PPPTP A QPT 


L^ .Z, o / 3 / X o o — 


1 Q P CZ 


XZTT? nQ T PPQPPTQPP A T .PPTTMPPPT 
Xlrii^o X k_ljoXiX7 X o^yi^XjLjL^rlMKxXrT 


no o t c o / 1 ir: q ipi 
Uzz / oi? / lDi? - X o X 


1 QQ7 

x y o / 


Ta7TZPPT^"P C TTPVATJn QPlTA7X<r a tttttt n 


n99 r 7t^Q/i p^ o nn 


1 Q P P 


vDnnnnTT pqpp"pi , tpt t'Ttp a pp 

X iX.L-XJL.Vjr X XjT oX\.i\xJ X X 1 X X XXXXjflr L- 




T Q P Q 


P"\ 7PP T niSTTTPPPP PimVTT .PT .PTP PPU 
r V L-Xli X L-XMrvLrX? vJXXXJLJ±\IXjL^XjXTLXX.X\.LrrT 


nQ r 7TTT P / PI "1 HP 
LJ;^ Zj UXj3 / o X — X U 3 


1 a a n 

X 23 4t U 


PT P P"K rP70"FTP P PP PPTsTT PT.PTPPPT-T 
r X L.X1j V L.X\lX\Jor LJX\.11iLJx\1XjLJxjMXa.X<LtXx 


OQ7TTTP / 1 ^ *7 1 7Q 
LJ Zj UXj3 / 1 J / — X I Z3 


T Q A 1 


TA7TrP"nTrP Q TTP V A\7"P Q *nTA7T<r A PT Q TTT P 
W X\.L-XJ Xvv_- O XvX\. X £\ V ^ o XJ Vv rv/-i.Xi O Xv X L^ 


HQ7TTTP /l fiA 9 0^ 
L^ Zj U Xj 3 / X O ^ — ^ U 3 


1 Q A 9 


VPnPinnTT.PQPPT^QPTTPTP a pn 
X JXL.iJL.Lr X XjX? OXvJXXJo X? X XJn.JX-rt.Jr L. 


nq^Tci /qc^i 1 7 
Xr 3 / 3 X / 3 X X / 


1 Q A P 
X 23*1 J 


rp 17 PXJVPPP "KTP PT 1 Q P A T .P PPTPTsT A PT 
i? Ji L-Xi. X L. J? XvXM X? Xr X Ov^/iljVjVjnL^XNli^XT 


08 1 R97/1 96-91 9 


1 944 


VRrHKPGTRKFSKTiRAAKAKTTTiTKH 

v kPV^lXXV\_>\JlJ JlvJ. l~J x\.J 1 ijrxrliJrl.1 11 X J 1 X 1\_L -L 


Q9ZUL4/82-104 


1945 


WKCEKCS KRYAVQ SD WKAHS KTC 


Q9ZUL4/109-130 


1946 


YRCDCGTIFSRRDSYITHRAFC 


Q9ZUL4/6-28 


1947 


FI CDVCNKGFQREQNLQLHRRGH 


Q9SHD0/194-216 


1948 


FKCETCGKVFKSYQALGGHRASH 


Q9SHD0/243-265 


1949 


HECPICFRVFTSGQALGGHKRSH 


Q9SHD0/4-26 


1950 


YKCRFCFKSFINGRALGGHMRSH 


064936/131-153 


1951 


YQC1WCGRELPSYQALGGHKASH 



WO 02/099084 



PCT/US02/22272 



113 



Ub^ _? J b / X / y — £\j X. 


i q c: o 
x y o a 


OTJ'D Th 1 T?G , T , P 1 "P"GT fPlJVDT U 
rlxvv^ o X L-xlxCHi r o 1 uririo XjLjLjJrLJ5J7tXjJrl 


yjOlu U/ DO — O / 


1 OCT 


xlir 1 jiL-VjxvyJr J_irvi-iljr vjrll v lKv_rl 




1 OKA 

x y 3 ^± 


r JiL^iJLjL-iA-xvV r ^oriyAXj^^iriKAxri 


Ann T.TH /Oil 


X y 3 3 


rUbC^-XNJ ILbKvr oo^yAJj^LxrLIrlK^ri 


iDijiJ^t / 4t /-by 


x y d b 


r ii ix x L-1NJ IVKr oo r yAljljLjriKAori 


yyoxju^/ y^— XXb 


x y 3 / 


rixvL- oXL-oyoJrvjj. LjyAXj^LjJ^lYlKJKXi 


AQ7TTOQ / 1 O 1 T A *3 

yyziuyo / x^ x - x^t o 


t q c: Q 

x y 3 o 


pririnT ^V*TvT"D"C ir PCTP"C 1 T?T7'CI"\7XTT TT? GP* 
r JiL.ir X L.i\x\l Jr r* 1 oJiJiJi V o VJrL VJiov^ 


yypr 1j/ X / / — ZUU 


1 Q ET Q 

x y 3 y 


P 1 A PDOPPT?^ 7"G 1 "DT< r T T?CT "n , "U"~UTP\ 7\ T 7TD TT 

LALryLbJi V r Jr JXXjJioXiJiJrLJrLy/iVx<.rl 


/~>i Q r7/ r> v"C 1 Pi / D A A OCZ.CZ 
yyZiyiiU/ /4t4-ZDD 


x y b u 


V I T I P 1 "DT< : ''P 1 "MP 1 \ FT? "NTT G PiTTT? A A UMC GTJ 


y^t .<£ 4fc 3 / ol)-lU/i 


1 Q CZ 1 

x y b x 


vtt n c\rr 1 'nF'T , 'c , c q vn7\ t c^r^ui^i\ gxj 
x i\.L^o V r o o x yAiibbnJ\A.Dn 


y 4fc Z*±Zz5 / 1oo-1jo 


x y b z 


Jti V X X L-lNlIvor ir oljy/\Xjvj^WJt\.KL-xl 


/"\ q r7T A yj\ CZ / 1 A CZ 1 £TQ 


x y b J 


TfjTST~ i xri<rr^7\ x<r~o vat g tt\X/Jt^~7\ tug ifi^r^ 
W ivL^ Ji J\.L-/\ rs-K. x /\ V y o JJ WJtvAxloivl 


p\ q ^tat a cz / i ^7 o i q /i 

y y Zi w Ab / x / j - x y 4t 


1 Q CZ A 

xyo4 


VD P 1 *P\P 1 P 1r T 1 T T?CX?'DT\CX7 1 T r PT-TT3 A X7P* 


yyzjWAb/ /u-y^ 


x y b 3 


t JuK^hs X UCjJ\.Ljr yKJjyiNIXjy XjJrlKKCjjirl 


no n o/i o / o o r~ i 

Uouy4z / jy-oi 


1 Q CZ CZ 

x y b o 


VTP OIPPDDiriPI/C! AP7\ T P*P 1 XJT\A'\T1 7TU 

xlLor UKKJir Jxoi\y/\Xj^^hLlYliNl Vri 


yj y^x / / yu-iiz 


iyb / 


UFr 1 C'TnC!AO , Dr tr PPn7\T r-i /—iTT'IV/T'D "D TT 


yj yzi / / 4o-ob 


x y O O 


U'C 1 PT7 r T , r , 'NTI( ,r D TP G C T?r\7\ T PPTJDTV CU 

r JiUJxl L-JMiVKr oor yAijtjLjMKAorl 




1 Q £T Q 

± y o y 


r JiUJi 1 (^JiixV± H Jxo xyAxj^CjJd.KAoil 


yj yuyz/ zuy-z ji 


i q ""7 n 

x y / u 


UI7P'DTP7\I(j r TrtP r PC!PP7\T OP i T-TT<rD GTU 


yoyuyz/o — z/ 


1 Q *7 1 
X / X 


Xl ivL- iv Xj Vv xS-O i? J-xvi \srx<-t\A-j Ljvjrll v l iX. O xl 


081793/138-160 


1972 


PVCHICGRGFGSWKAVFGHMRAH 


064828/530-553 


1973 


LQCIPCGSHFGDKEQLLVHVQAVH 


064828/599-622 


1974 


FVCKFCGLKFNLLPDLGRHHQAEH 


064828/496-519 


1975 


FACAI CLDS FVRRKLLE I HVEERH 


049591/251-278 


1976 


FM CL Y CNEL CRP F S S LE AVRKHME AKS H 


049591/26-50 


1977 


LTCNACNMEFKDEEERNLHYKSDWH 


049591/90-114 


1978 


YTCAICAKGYRSSKAHEQHLQSRSH 



There follow several examples of how to construct and select DNA-b hiding sub-domains 
from libraries of natural zinc fingers. 

5 

Example 4: Human Zinc Finger Module 'Mini-Library'. 

As a prehminary test of the efficacy of using natural zinc finger modules for constructing 
novel DNA-binding domains, a 'mini-library 5 of natural, human zinc finger modules is 
1 0 generated. The mini-library comprises 8 zinc finger modules, which have the following 
nomenclature assigned to them in the human genome database: Zif268 finger 1, Zif268 
finger 2, Spl finger 3, WT1 finger 1, 015391, 075626, ZN45 and Z165. Since there is 
more than one zinc finger module belonging to the zinc fingers proteins ZN45 and Z165, 
we have called the selected modules ZN45-(AAA) and Z165-(GCC) respectively, 
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according to their predicted binding site. We have also predicted the binding sites for the 
zinc fingers 015391 and 075626. The preferred binding sites for Zi£268 finger 1, Zif268 
finger 2, Spl finger 3 and WT1 finger 1 are already known. The amino acid sequences of 
each of the stated modules, and their predicted or previously determined binding 
5 sequences are shown in Table 3. 

Two 3-zinc finger peptide libraries are prepared, containing the 8 zinc finger modules 
stated. All novel 3 -finger peptides contain a leader sequence, MAEERP (SEQ ID 
NO: 16), at the start of the peptide and are tagged by the sequence 

1 0 LRQKDGGGS YP YD VPD YA (SEQ ID NO : 1 989) at the C-terminus. This sequence 

provides: (in the absence of a further C-terminal finger) a suitable terminus to the final a- 
helix of the peptide -LRQKD- (SEQ ID NO: 1987) as found in wild-type Zi£268; a short, 
flexible linker sequence, GGGS (SEQ ID NO:2121); and an HA-tag (YPYDVPDYA 
(SEQ ID NO:2122)), which is recognised by the HA-antibody. Adjacent zinc finger 

15 modules are fused using the linker peptide sequence TGEKP (SEQ ID NO:3). The 
peptide sequences described above are also displayed in Table 3. 

In the first library (library 1), the 8 zinc finger modules are recombined in random order 
to create 3 -finger peptides with all possible combinations of the 8 zinc finger modules. 

20 Such a procedure results in a library diversity of 5 12 (=8 3 ), comprising peptides that are 
predicted to bind to any possible combination of the binding sites assigned in Table 3. 
Library 1 allows novel 3-finger domains to be selected as a unit, for specified 9 bp target 
sequences. Such 3-finger units may be used for the construction of poly-zinc finger 
peptides as described in Moore, M., Choo, Y. & Klug, A. (2001) Proc. Natl Acad. Set 

25 USA 98: 1432-1436; and WO 01/53480. 

In the second library (library 2), the 8 zinc finger modules are randomly recombined to 
create 2-finger peptides which are all joined to the C-terminus of Zif268 finger 1. The 
invariant finger 1 acts as an anchor for the selection, both by providing extra affinity to 
30 stabilise the selection, and by fixing the register of the protein DNA interaction (as 

discussed supra). Such a library has a diversity of 64 (=8 ), and allows novel 2-finger 
units to be selected for a given 6 bp target sequence. The resulting 2 finger units can be 
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recovered by PGR and used in the construction of poly-zinc finger peptides (based on 
strings of 2-finger units), as described in WO 01/53480. 

These two libraries (encoding 3 -finger peptides) are screened, as described below, for the 
ability of their encoded proteins to bind three different 9 bp binding sequences: 5'-GCG- 
TGG-GCG-3 ' ; 5'-GGA-TAA~GCG-3 ? ; and 5'-GCC-GAG-TGG-3\ 

As positive controls, the genes encoding the 3-finger peptides predicted to bind the above 
target sequences are specifically constructed and tested in a similar manner. 





FINGER/UNIT 


SEQ ID NO: 


PEPTIDE SEQUENCE 


SITE 


1 


ZIF268 Fl 


1979 


YACPVESCDRRFSRSDELTRHIRIH 


GCG 


2 


ZIF268 F2 


1980 


FQCRICMRNFSRSDHLSTHIRTH 


TGG 


3 


Spl F3 


1981 


FSCPI CEKRFMRSDHLTKHARRH 


GGG 


4 


WT1 Fl 


1982 


FMCAYPGCNKRYFKLSHLQMHSRKH 


GAG 


5 


015391 


1983 


F VC P FD VCNRKFAQ S TNLKTH I L TH 




6 


075626 


1984 


FKCQTCNKGFTQLAHLQKHYLVH 


GGA 1 


7 


ZN45-AAA . 


1985 


YKCEECGKGFSQASNLLAHQRGH 




8 


Z165-GCC 


1986 


YE CNE C GKS F AE S S DLTRHRR I H 


GCC 1 


9 


leader 


16 


MAEERP 




10 


linker 


3 


TGEKP 




11 


G 3 S-HA-tag 


1989 


LRQKDGGGSYPYDVPDYA* 





Predicted binding site. * indicates a translation stop codon. 



Table 3. Nomenclature, amino acid sequences and known or predicted binding sequences 
("SITE") of zinc finger modules and other peptide units used in library construction. 

a. Human Zinc Finger Mini-Library Construction. 

Two libraries are prepared, according to the scheme shown in Figure 2. The N-terminal 
finger of the 3 -finger construct is referred to as 'cassette A 5 . The central finger is 
encoded by cassette B, and the third (C-terminal) finger module is called cassette C. 

Zinc Finger Cassettes 

Polynucleotide sequences encoding the amino acid sequences of the 8 zinc finger 
modules shown in Table 3 are determined, taking into account E. coli codon preferences, 
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and the corresponding nucleotide sequences are synthesised as single stranded 
oligonucleotides, examples of which are shown in Table 4. Also shown are the sequences 
of exemplary linkers and an exemplary 3 '-tag required for the assembly of 3-finger 
domains. Double stranded cassettes encoding the zinc finger modules and relevant 
5 leader, linker, and terminator sequences are generated by PCR according to the procedure 
described below, using the appropriate oligonucleotide templates of Table 4, and primers 
of Table 5. 



X 


CODE 


FINGER 


SEQ ID NO NUCLEOTIDE SEQUENCE 


1 


AS 144 


ZIF268 Fl 


1990 


TATGCGTGCCCGGTGGAAAGCTGCGATCGTCGTTTTAG 
CCGTAGCGATGAACTGACCCGTCATATTCGTATTCAT 


2 


AS145 


ZIF268 F2 


1991 


TTTCAGTGCCGTATTTGCATGCGTAACTTTAGCCGTAG 
CGATCATCTGAGCACCCATATTCGTACCCAT 


3 


AS 148 


Spl F3 


1992 


TTTAGCTGCCCGATTTGCGAAAAACGTTTTATGCGTAG 
CGATCATCTGACCAAACATGCGCGTCGTCAT 


4 


AS 149 


WT1 Fl 


1993 


TTTATGTGCGCGTATCCGGGCTGCAACAAACGTTATTT 
TAAACTGAGC C ATCTGCAGa t gCATAGCCGTAAACAT 


5 


AS 150 


015391 


1994 


TTTGTGTGCCCGTTTGATGTGTGCAACCGTAAATTTGC 
GCAGAGCACCAACCTGAAAACCCATATTCTGACCCAT 


6 


AS151 


075626 


1995 


TTTAAATGCCAGACCTGCAACAAAGGCTTTACCCAGCT 
GGCGCATCTGCAGAAACATTATCTGGTGCAT 


7 


AS 152 


ZN45- 
AAA 


1996 


TATAAATGCGAAGAATGCGGCAAAGGCTTTAGCCAGGC 
GAGCAACCTGCTGGCGCATCAGCGTGGCCAT 


8 


AS 153 


Z165-GCC 


1997 


TATGAATGCAACGAATGCGGCAAAAGCTTTGCGGAAAG 
CAGCGATCTGACCCGTCATCGTCGTATTCAT 


9 




MAEERP 
leader 


1998 


ATGGCGGAAGAACGTCCG 


10 




TGEKP 
linker 


1999 


ACCGGCGAAAAACCG 


11 




G 3 S-HA- 
tag (tag) 


2000 


CATCTGCGCCAGAAGGACGGCGGCGGCAGCTATCCGTA 
TGATGTGCCGGATTATGCGTAA 



10 Table 4. Nucleotide sequences encoding zinc finger modules and other peptide sequences 
used in the construction of 3 -finger proteins. 



X 


CODE 


NAME 


SEQ ID NO 


SEQUENCE 


1 


AS5 


pETFwdl 


2001 


CGCTGACTTCCGCGTTTCC 


2 


AS86 


SDRev 


2002 


ATGTATATCTCCTTCTTAAAGTT 


3 


AS93 


ZnFlFwd 


2003 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
GGTCCGTATGCGTGCCCGGTGGAAAG 


4 


AS94 


ZnF2Fwd 


2004 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
CGTCCGTTTCAGTGCCGTATTTGCATG 
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J 


AobO 


Z/llr JrWCt 


o n a 


iA/AL> 111 J\I\^J\J-\\j L^/AbiiA 1 /A 1 iAL-/A 1 iA 1 LrbrC buAAbAA 

CGTCCGTTTAGCTGCCCGATTTGCG 


zr 
O 


A COA 

Ao9o 


ZjXW 4rWCt 


o n a ^ 
Z U U b 


/AA.L- 111 /\i\Vj/\A.LjVjri\lji\ 1/11 i\C/A 1 i\ 1 LjlcrL.ljL/i/A^AA 

CGTCCGTTTATGTGCGCGTATCCGGG 


1 




Znr 5rwa 


o a A *"7 

zUU / 


AAL 111 AACjAACjCjALtA 1 A 1 AL- A 1 A 1 (jCjt U CjCjAAGAA 


Q 
O 




/jilt Or WQ 


zuuo 


CGTCCGTTTAAATGCCAGACCTGCAAC 


9 


AS99 


ZnF7Fwd 


2009 


AACTTTAAGAAGGAGATATACATATGGCGGAAGAA 
C GT C C GT ATAAATGCGAAGAAT GC GGC 


10 


AS 100 


ZnFSFwd 


2010 


AAC T T TAAGAAGGAGAT ATAC AT ATGGC GGAAGAA 
GGTCCGTATGAATGCAACGAATGCGGG 


11 


AS101 


1 Link 1 Rev 


2011 


CGGTTTTTCGCCGGTATGAATACGAATATGACGGG 


1 o 
1Z 


A Q1 AO 




/ U 1Z 


ppp r p , T i T r P r pppp pptPiT z\ t pi nnT z\ pri a a, t«7v TrtnnTPf 


1 1 
Id 


A Q 1 AO 

Ao 1UJ 


li^inKJJbcev 


ZUIj 


V^orkjrl 111 1 v^.kj^^_v^TVjTX^Al ^/A^w-VjjHt^vjjV^Va-V^^AlVjl 1 l^jjvjj 


1 /! 

14 


A C 1 C\A 

Ao 104 


lJLinK4K.ev 


ZUX'i 


PPP r T ir T ,r r ,r P r nPPPPPP r T 1 2i TPTTTZX PPPPHH^ TPPTi TPT 
U.VjVjt1 111 1 L-0-U.U.kjLjliAl tsj 1 1 AiAv^ioLn-V-. liAl ^jr^iAl ^ 1 

G 




a c 1 
Ao ILD 


ij^iniorcev 


Z Ul 3 


PPPTT'T l T , T n PPPPPPt r rA r PPtPP r PP2\PA i A TPPPT"T r P 

TC 


10 


A C 1 fl£ 


1 i^inKOJbvev 


Zulu 


L-VjtvjI 1111 \^Vj-V_v_LjVjt liA 1 lo^/AV^ ^iAVjxiA 1 /AiA 1 ^sJl 1 1L1 

GC 


1 n 
1 / 


AolU / 


I i_/inic / Jtvev 


oni 7 

Z U 1 / 


PPPT 1 T T T'T 1 P P P P PPT A HP PtPP P Z\ PPP T'Pt 21 "TPtP PtP 


18 


AS 108 


lLink8Rev 


2018 


CGGTTTTTCGCCGGTATGAATACGACGATGACGGG 


1 A 

19 


a o 1 r\c\ 

AS 109 


ILinklFwd 


o a i o 


7\ rp 7\ /^t/^t/^r-ir^r^A A a a a /^r^r^ r T , A T i r , PP r pr , r , r | r , pr ,, T , nP7\ 
LAI ALLbbLb/yiA/iA.LLb 1 Ai bLb 1 bLLLbb 1 bbA 

AAG 


10 


A C1 1 A 

AS110 


ILiiiKz^wa 


d a o a 
zUzU 


LAI ACUbrCjL.(^rAAAAAl-CCjl 1 1 LAbTlbTLCtjl Al 1 Ibr 

CATG 


11 


A CI 1 1 1 

Ablll 


lLiiik3Fwd 


O A O "1 


r^AnPA/^r^/^/^/^/^AAAAA nnn | T 1f T ir n7\ nPTTTiPrTiTV rpmrnn 

LAI ACUbrLjCCjAAAAAL.UCjl 1 lAbrLlbrLCCbjAl 1 ICj 

CG 


12 


A O 1 1 O 

Abll2 


lLinlc4rwa 


O A O O 

zUzz 


LAI ACCvjCjCCjAAAAACL.(^t1 1 1A1 brlCjL.CjCbilAl LL 

GGG 


13 


A O 1 1 O 

AS113 


ILinlorwa 


<d U2 J 


LAXALLbbLbAAAAALLbl 1 IbrlbriCjL-UUCjl 1 IbrA 

TGTG 


14 


A O 1 1 A 

AS 114 


-it" "i _ s — „ J 

lLmkoFwa 


2 024 


CATACLCjGCGAAAAALCGTiTAAATGCCAGACCTG 
CAAC 


15 


A O 1 1 C 

AS115 


lLink/Fwa 


O A O IT 

2 02b 


LA 1 AC CCjGCGAAAAAL- C Cj 1 A 1 AAA 1 CjCCjAACjAA 1 
GGGC 


16 


ASllo 


lLink8Fwa 


o nor 

2 02 6 


CATACCGGCGAAAAA(_C(ji 1 A 1 GAAI GCAACGAATG 
CGGC 


1 T 
1 / 


AC117 

Ac>l 1 / 


zivinlclKev 


ZUz / 


r PPr | r ( TTP r rPA ppppTfiTPA tp/\ a ta pp a a t 1 a t 1 ^ a p 
GGGTC 


18 


AS118 


2Link2Rev 


2028 


TGGCTTCTCACCCGTGTGATGGGTACGAATATGGG 
TGC 


19 


AS119 


2Link3Rev 


2029 


TGGCTTCTCACCCGTGTGATGACGACGCGCATGTT 
TGG 


20 


AS 120 


2Link4Rev 


2030 


TGGCTTCTCACCCGTGTGATGTTTACGGCTATGCA 
TCTG 
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O 1 

21 


A C101 


2JLinK5Kev 


n n q i 
Z U J ± 


r Pf l r l r ,r P , T , r ir Pr , 7\ pn/^iprnrtmp7i ry-i^—i/^i^rp/— it\ s-i-n, j\ rj%-j\ rp^-i/^t^-i 
X brbrb, 11L1 CACL.L,br X br X brA X brbrbr 1 CAbxAAX A 1 brbrbr 

TTTTC 


ZZ 


API OO 

Ab Izz 


2JLuiK:oK.ev 


o n *3 o 


Xbrbrv^ 11L1 LALLLblblbAlbLALLAbAlAfilbl 1 

TCTGC 


zi 


A C 1 


zLirLK /Kev 


o n "3 "3 


Xbrvjl^X 1L XvwAv w v^b*brXbTXbrAXbrb7CCACbTL.Xbr^ 


24 


AS 124 


2Link8Rev 


2034 


TGGCTTCTCACCCGTGTGATGAATACGACGATGAC 
Cine* 


25 


AS 125 


2LinklFwd 


2035 


CACGGGTGAGAAGCCATATGCGTGCCCGGTGGAAA 
ex 


26 


AS 126 


2Link2Fwd 


2036 


CACGGGTGAGAAGCCATTTCAGTGCCGTATTTGCA 


27 


AS127 


2Link3Fwd 


2037 


CACGGGTGAGAAGCCATTTAGCTGCCCGATTTGCG 


28 


A O 1 O O 


2J^inK4Jpwa 


O A O O 


L AL bbb 1 bAbAALrL LA 1 X 1 A J. brX brbbrbbrX Al CCLrbr 

G 


29 


AS 129 


zLinlCDrwa 


n n o q 

zu 


CAbbrbr^XbrAbrAAbTb-CAX X X br X brX bxbbCbr X X XbrAXbr 
TG 


30 


AS 130 


2Link6Fwd 


2040 


CACGGGTGAGAAGCCATTTAAATGCCAGACCTGCA 

A O 

AC 


31 


AS131 


2Lirik7Fwd 


2041 


CACGGGTGAGAAGCCATATAAATGCGAAGAATGCG 
GC 


32 


AS 132 


2Link8Fwd 


2 042 


CACGGGTGAGAA.GCCATATGAATGCAACGAATGCG 
GC 


33 


AS 133 


3HAlRev 


2043 


CTAGGAATTCTTACGCATAA.TCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGXCCTTCTGGCGCAGATG 
AATACGAATATGACGGGTC 


34 


AS 134 


3HA2Rev 


2044 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCG X CCTTCTGGCGCAGATG 
GGTACGAATATGGGTGC 


35 


AS 135 


3HA3Rev 


2045 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
ACGACGCGCATGTTTGG 


36 


AS 136 


3HA4Rev 


2046 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
TTTACGGCTATGCATCTG 


37 


AS 137 


3HA5Rev 


2047 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
GGT CAGAATATGGGTTTT C 


38 


AS 138 


3HA6Rev 


2048 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 

brbrAX ACjL- X bLLbLLbLLblLL ilLl brbrUbiUAbxAX br 

CACCAGATAATGTTTCTGC 


39 


AS 139 


3HA7Rev 


2049 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
GCCACGCTGATGCGC 


40 


AS 140 


3HA8Rev 


2050 


CTAGGAATTCTTACGCATAATCCGGCACATCATAC 
GGATAGCTGCCGCCGCCGTCCTTCTGGCGCAGATG 
AATACGACGATGACGGG 
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41 


AS141 


Rev3 


2051 


CTAGGAATTCTTACGCATAATC 


42 


AS 142 


ILinkRev 


2052 


CGGTTTTTCGCCGGTATG 


43 


AS 143 


2LinkRev 


2053 


TGGCTTCTCACCCGTGTG 



Table 5. Modifying oligonucleotides used for mini-library construction. 

5 1. Library 2. 

Once made into double stranded DNA cassettes, the finger units are attached to T7 
upstream expression sequences by PCR overlap extension, using the following protocol. 

10 (a) Upstream sequences are first extracted from pET23a by PCR using primers 

pETFwdl and SDRev, generating the fragment pET5\ 

(b) The fingers for cassette A are amplified with forward primers ZnFxFwd 
(AS93-100) and reverse primers ILihkxRev (AS101-AS108), where x is the number of a 

15 particular finger from Tables 3 and 4, as indicated. 

(c) The fingers for cassette B are amplified with forward primers lLinkxFwd 
(AS 109-1 16) and reverse primers 2LinkxRev (AS1 17-AS124), where x refers to the 
finger module number. 

20 

(d) The fingers for cassette C are amplified with forward primers 2LinkxFwd 
(AS125-132) and reverse primers 3HAxRev (AS133-AS140), where x refers to the 
appropriate zinc finger module. 

25 The steps to create cassettes A, B and C are performed separately, however, mixed 

populations of template oligonucleotides can be added to each PCR of steps (a) ? (b), and 
(c) to produce a library of each cassette. 

The final 3 -finger library is assembled by overlap extension as outlined in Figure 2. In 
30 the first step the mixed pool of cassette A is appended to the upstream sequences, pET5 ' . 
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Equimolar amounts are mixed and PCR-cycled in the absence of primers. The reaction 
product is either purified immediately or reamplified before purification using primers 
pETFwdl and ILinkRev. 

5 In the second step cassette B (mixed pool) is appended to the product of the above step. 
Again, equimolar amounts are mixed and PCR-cycled in the absence of primers. The 
reaction product is either purified immediately or reamplified before purification using 
primers pETFwdl and 2LinkRev. 

i 

10 In the final step cassette C (mixed pool) is appended to the above product. Equimolar 
amounts are mixed and PCR-cycled in the absence of primers. As before, the reaction 
product may be purified immediately or reamplified before purification using primers 
pETFwdl and Rev3. (see, also Figure 2). 

15 2. Library 2. 

Library 2 is assembled in a similar maimer to Library 1 except that cassette A is 
represented by Zif268 finger 1 only. 

20 The final PCR products containing T7 promoter sequences and encoding 3-finger 
peptides attached to an HA-antibody tag are purified and used for the production of 
protein. 

25 b. Zinc Finger Library Screening. 

Two exemplary methods for screening zinc finger libraries, such as those produced 
above, are described in Protocol A and Protocol B, below. 



WO 02/099084 



PCT/US02/22272 



121 

Protocol A: 

The peptides of library 1 and library 2 are screened to select 3-zinc finger domains which 
bind the sequences: 5'-GCG-TGG-GCG-3'; 5'-GGA~TAA-GCG-3'; and 5 5 -GCC-GAG- 
5 TGG~3\ Since library 2 contains Zif268 finger 1 in the N-terminal position, in theory, 
these peptides should only bind the sequences, 5 ? ~GCG-TGG-GCG-3% and 5'-GGA- 
TAA-GCG-3 ' . Hence, library 2 is effectively used to select 2-finger units which bind 
strongest to the 6 bp sequences, 5'-GCG-TGG-3', and 5'-GGA-TAA-3\ Double 
stranded binding sites for use in the selection protocol are generated by annealing the 
10 complimentary oligonucleotides: Zif.b site and Zif site RC (AS 154 and AS 155); 

#l#5#6.b and #1#5#6 RC (AS156 and AS157); and #2#4#8.b and #2#4#8 RC (AS158 
and AS 15 9). The top strand of each binding site is biotinylated, allowing capture of 
binding site/zinc finger/H A- antib o dy ternary complexes to the streptavidin-coated plate in 
an ELISA screening assay. The oligonucleotides are displayed in Table 6, below. 

15 



X 


Code 


Name 


SEQ ID NO 


Sequence 


1 


AS 154 


Zif.b site 


2054 


TTTTTTTTTTGCGTGGGCGTTTTTTTTTT 


2 


AS 155 


Zif site RC 


2055 


AAAAAAAAAACGCCCACGCAAAAAAAAAA 


3 


AS156 


#l#5#6.b 


2056 


TTTTTTTTTTGGATAAGCGTTTTTTTTTT 


4 


AS 157 


#1#5#6 RC 


2057 


AAAAAAAAAACGCTTATCCAAAAAAAAAA 


5 


AS158 


#2#4#8.b 


2058 


TTTTTTTTTGCCTGTTGGTTTTTTTTTTT 


6 


AS 159 


#2#4#8 RC 


2059 


AAAAAAAAAAACCAACAGGCAAAAAAAAA 



Table 6. Oligonucleotide sequences used to generate double stranded binding sites used 
in the selection procedure. 

20 

The PGR- amplified 3 -finger constructs are gel-purified from a 1% TAE-agarose gel using 
the Gel Extraction Kit (Qiagen) and quantified based on absorbance at 260 nM. Dilutions 
(in 0.25 mg/ml X DNA) of DNA template encoding for either library 1 or 2 are prepared 
25 at the final total template concentration of 4.2 fM and 1 fM, respectively. At these 

concentrations 1 \xl of template contains approximately 2500 and 600 molecules of library 
1 or library 2, respectively. At such low concentrations, such samples must be PGR 
amplified to generate enough template for protein expression. Hence, these 1 |nl aliquots 
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are taken and added to 1 ml PCR pre~mix, containing primers Rev3 (AS 141) and 
pETFwd2 (primer sequences shown below, see Table 7). The PCR pre-mixes are then 
aliquoted into 96 (or 384) well plates at 10 jlxI per well, which is the equivalent of 
approximately 25 or 6 molecules of library 1 or library 2 template, respectively. 
5 Templates are amplified using 30 cycles of PCR. After this first round of PCR, 0.5 jlxI 
aliquots of PCR product are added to new 10 \il PCR pre-mixes (in 96 or 384 well 
format), containing nested primers, pETFwd3 and Rev3, and amplified for another 30 
cycles. The resultant product is concentrated enough to perform in vitro transcription / 
translation. 

10 

In vitro translation experiments using TNT PCR coupled transcription-translation mix 
(Promega) are assembled according to the manufacturer's instructions. Typically 5 jiil 
final volume contains 1 [il of each PCR product and 4 |^1 rabbit reticulocyte pre-mix 
(containing 20 |aM methionine, 12.5 |j,g/ml X Hind III digest (Roche), 500 juM ZnC^ 

15 (Sigma), 0.7 jul H2O, 40 11M PCR-amplified DNA template). Reactions are incubated at 
30°C for 90 minutes. 50 \xl PBS binding buffer containing 0.1 % BSA (Sigma), 0.5% 
Tween 20 (Sigma), 50 jliM ZnCk, 10 nM of the appropriate biotinylated binding site, 25 
pXJ/ml rat 3F10 anti-HA HRP conjugate (Roche) is added to the translation mix and 
incubated for 45 minutes at room temperature. The binding mix is thereafter transferred 

20 to pre-blocked black streptavidin-coated 8 -well strips or 96/384 well plates (Roche), and 
the ternary complexes containing 3 -finger peptide, biotinylated binding site and anti-HA 
HRP antibody are captured while shaking at 200 rpm for 45 minutes at room temperature. 
The wells are then washed five times with 1 00 pi PBS binding buffer containing 0.1% 
BSA (Sigma), 0.5% Tween 20 (Sigma), 50 \xM ZnCh to remove unbound components. 

25 Finally, the retained HRP activity is measured by adding 50 |Lil QuantaBlu fluorogenic 
HRP substrate (Pierce). Figure 3 demonstrates the capture and detection of target site- 
binding zinc finger peptides using the assay described. Fluorescence is measured on a 
SpectraMax Gemini XS (Molecular Devices) fluorescence microplate reader at 320 nm 
excitation, 433 nm emission and 420 nm cut-off values. 

30 

The wells that give the highest levels of fluorescence are those which contain the highest 
number of, or tightest binding 3-finger peptides. PCR products from the second PCR 
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amplification stage, corresponding to such samples, are purified from TAE-agarose gels 
and quantified, as above. Pure PGR products are diluted to approximately 50 molecules 
per \xl (which is equivalent to approximately 100 aM concentration) in 0.25 mg/ml X 
DNA. As above, 1 pi samples of template are added to 1 ml PGR pre-mix containing 
5 primers, pETFwd4 and Rev3 . 10 aliquots are placed in each well of a 96 well plate. 
At this stage, there is (on average) 0.5 template molecules per aliquot. Therefore, 
generally speaking, half of the samples will contain no template and half will contain a 
single template molecule. Samples are then PCR amplified using 30 cycles. Again, 0.5 
]ol PCR samples are taken from each well and amplified again by 30 cycles of PCR using 
10 the nested primers^ pETFwdS and Rev3. 1 jul of each of these PCR products is used for 
protein expression, as described above. At this stage, the highest levels of fluorescence 
correspond to the samples containing the tightest binding 3 -finger peptides. The PCR 
product encoding such peptides is purified, as before, and can be sequenced to determine 
the protein sequence of the optimal 3 -zinc finger domain for the appropriate binding site. 

If further rounds of selection are required, PCR amplification can be conducted with the 
nested primers pETFwd6, pETFwd9 and pETFwd7, also shown below (Table 7). 



NAME 


SEQ ID NO 


SEQUENCE 


pETFwdl 


2060 


CGCTGACTTCCGCGTTTCC 


pETFwd2 


2061 


TCCAGACTTTACGAAACACGG 


P ETFwd3 


2062 


CGAAGACCATTCATGTTGTTGC 


pETFwd4 


2063 


GTCGCAGACGTTTTGCAGC 


pETFwd5 


2064 


GCAGTCGCTTCACGTTCGC 


pETFwd6 


2065 


CGCTCGCGTATCGGTGATTC 


pETFwd9 


2066 


CATTCTGCTAACCAGTAAGGC 


P ETFwd7 


2067 


GCCTAGCCGGGTCCTCAAC 



20 Table 7: Primers used for PCR amplification of 3-fmger cassettes (as constructed by the 
procedure of Figure 2) to provide template used in screening zinc finger libraries. 
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Protocol B: 

The peptides of library 2 were screened to select 3-zinc finger domains which bind the 
5 sequences : 5 ' -GCG-TGG-GCG-3 ' , and 5 ' -GGG-AGG-CCT-3 ' . Double stranded binding 
sites for use in the selection protocol were generated by annealing the complementary 
oligonucleotides: Zif b site and Zif site RC (AS 154 and AS 155, shown above), which 
generated the 5'~GCG-TGG-GCG-3' binding site; and the oligonucleotides 5'- 
TTTTTTTTTTGGGAGGCCTTTTTTTTTTT-3 5 (SEQ ID NO:2123) and 5'- 
10 AAAAAAAAAAAGGCCTCCCAAAAAAAAAA-3' (SEQ ID NO:2124), which 

generated the 5 5 -GGG-AGG-CCT-3' binding site. The top strand of each binding site 
was biotinylated, allowing capture of binding site/zinc finger/H A- antibody ternary 
complexes onto streptavi din-coated plate in an ELISA screening assay. 

15 The 3 -finger library 2 constructs were cloned into the multiple cloning site of vector 

pET23a (Novagen), using appropriate restriction sites. This library was then transformed 
into E.coli and plated out to grow single colonies. 384 colonies (which should represent 
the vast majority of the 64 member library) were picked into 2xYT media with ampicillin 
and cultures grown at 37°C overnight. Library 2 expression cassettes were recovered 

20 from bacteria by PCR using primers pETFwdx (where x is 1-7, eg pETFwdl) and Rev3 
as described in Protocol A above. 

In vitro coupled transcription / translation of PCR products was conducted as described 
above, with the difference that each of the 384 zinc finger peptides was screened 

25 individually in a well of a 384 well plate. The library was screened against the 5 '-GCG- 
TGG-GCG-3', and 5 '-GGG-AGG-CCT-3' binding sites, as detailed in Protocol A. Wells 
that yielded the highest levels of fluorescence were those which contain the tightest 
binding 3-finger peptides. The ELISA results from the screen of the 384 samples against 
the 5' -GCG-TGG-GCG-3 ' site are shown in Figure 4. Six constructs displayed 

30 significant binding to the target site and these are termed C8, G16, 119, 123, J19 and K19 
according to their coordinates on the 384-well plate. Similarly, one construct (B10) 
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showed strong binding to the 5'-GGG-AGG-CCT-3' target site. PCR products encoding 
the tightest binding peptides can be purified, as described supra, and sequenced. 

5 Some of the selected constructs: C8, J19, K19, 123, G16 (which bind the 5'~GCG-TGG- 
GCG-3 5 site) and BIO (which binds the 5 5 -GGG-AGG-CCT-3' site), were selected and 
screened against a range of different binding sites to test their specificity. The sites used 
were: 5'-GCG~TGG-GCG-3'; 5 5 -CCA-CTC-GGC-3'; 5'-CCT-AGG-GGG-3'; 5'-GGA- 
TAA-GCG-3'; 5'-GGG-AGG-CCT-3'; 5'-GCG-TAA-GGA-3'; and 5'-GCG-GGG- 
10 GGA-3'. The binding assay was conducted as described above. The results (Figure 5) 
show that the selected 3-zinc finger peptides bind preferentially to their target site, in 
comparison to the alternative binding sites tested. 

Example 5: Human Zinc Finger Module Libraries for Rapid Selection of 2-Finger 
15 Units. 

The preferred sub units of a poly-zinc finger construction strategy are in the form of two- 
finger sub-domains. Assuming that there are 1,000 individual natural finger modules, a 
library of all combinations of such zinc finger modules, in 2-finger units, would contain 

20 1,000,000 members. All of the 1,000 natural finger modules would have to be made from 
oligonucleotides, and the expense would be considerable. Furthermore, this figure is 
likely to be an underestimate of the number of natural fingers. Hence, due to the huge 
numbers of natural, human zinc finger modules available, it is advantageous to limit the 
size of the libraries screened, as discussed in the Description. One way in which library 

25 size can be reduced is to limit the library members to zinc finger modules which are 

predicted to bind the desired sequence. For instance, based on the target sites in Example 
1, if 2-finger domains are required to bind the sequence 5'-GCG~TGG-3 5 , an individual 
library can be constructed from the zinc finger modules predicted to bind the sequences 
5'-GCG-3' and 5'-TGG-3\ Equally, if the sequence 5 ? -GGA-TAA-3' is to be targeted, 

30 zinc finger modules predicted to bind the sequences and 5' -GGA-3' and 5'-TAA-3 5 can 
be used. Table 8 shows the natural, human zinc finger modules from Example 1, which 
are predicted to bind the aforementioned 3 bp sequences. 



WO 02/099084 



PCT/US02/22272 



126 



5 5 -GCG-3> 


5 5 -TGG-3' 


5'-GGA-3' 


5'-TAA-3' 


Zi£268 finger 1 (GCG) 


Zif268 finger 2 (TGG) 


BCL6 (NGA) 


TYY1 (NAA) 


Zif268 finger 3 (GCG) 


MAZ finger 2 (TGG) 


075626 (GGA) 


015391 (YAA) 


Spl finger 2 (GCG) 


WT1 finger 3 (TGG) 


ZN45 (N N / T A) 


075626 (YAA) 


WT1 finger 4 (GCG) 


SP4 (NGG) 


015535 (GNA) 


ZN45 (N N / T A) 


BTE1 (GCG) 


BTE1 (NGG) 


Q15776(GNA) 


Z136(TNN) 


043296 (GNG) 


Z136(TNN) 


060893 (GNA) 


Z239 (YAA) 


Z174 (GCG,RNA) 


Q 15776 (NGG) 


Z132 (a) (GGA) 


Q 15776 (a) (TNA) 


Z202 (GCG, RNA) 


ZN84 (YGG) 


Z132 (b) (GGA) 


Q15776(b)(TNA) 






Z132 (GGN) 


Z195 (YAA) 






ZN85 (GGA) 


ZN84 (YAA) 








075346 (TAA) 








ZN43 (TAA) 



Table 8. The natural, human zinc finger modules predicted to bind the sequences 5'- 
GCG-3\ 5'-TGG-3', 5'-GGA-3' and 5'-TAA~3'. 

5 

On the basis of the specificities shown in Table 5, a library of 2-fmger units to target the 6 
bp sequence 5 5 -GCG-TGG-3 5 has 64 (8x8) members, and a library to target the sequence 
5'~GGA-TAA~3' has 120 (10x12) members. To screen sample sizes of this magnitude 

10 we can construct each 2-fmger unit specifically (using for example, an 8x8 or 10x12 
matrix arrangement), and assay the samples containing individual clones using the 
fluorescent-ELISA protocol of Example 4. Such a procedure can save time in 
comparison to constructing all possible 64 or 120 variants in a random fashion (as a 
library), as described in Example 4, because the number of constructs screened would 

15 have to be considerably higher. 

a. Construction of 2-Finger Domains to Bind 5'-GCG-TGG-3' 

A 64 member, 2-finger library is constructed from the natural, human zinc finger modules 
20 predicted to bind the sequences 5'-GCG-3 5 and 5'-TGG-3' (Table 8, columns 1 and 2). 
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The 2-finger library units are all attached to the C-terminus of Zif268 finger 1, which acts 
as an anchor finger. The construction protocol is different from that described in 
Example 4, as described below. 

5 Zinc Finger Cassettes 

Nucleotide sequences encoding the amino acid sequences of the 16 zinc finger modules 
(Table 8, columns 1 and 2) are determined, taking into account human codon preferences, 
and the corresponding nucleotide sequences are synthesised as single stranded 
10 oligonucleotides, shown in Table 9. Double stranded cassettes encoding the zinc finger 
modules and flanking linker sequences are generated by PGR using the appropriate 
primers, shown in Table 10. 



X 


FINGER 


SEQ ID 
NO 


NUCLEOTIDE SEQUENCE 


1 


Zif268 Fl 


2068 


TACGCCTGCCCCGTGGAGAGCTGCGACCGCCGCTTCAG 
CCGCAGCGACGAGCTGACCCGCCACATCCGCATCCAC 


2 


Zif268 F3 


2069 


TTCGCCTGCGACATCTGCGGCCGCAAGTTCGCCCGCAG 
CGACGAGCGCAAGCGCCACACCAAGATCCAC 


3 


Spl F2 


2070 


TTCGCCTGCAGCTGGCAGGACTGCAACAAGAAGTTCGC 
CCGCAGCGACGAGCTGGCCCGCCACTACCGCACCCAC 


4 


WT1F4 


2071 


TTCAGCTGCCGCTGGCCCAGCTGCCAGAAGAAGTTCGC 
CCGCAGCGACGAGCTGGTGCGCCACCACAACATGCAC 


5 


BTE1 


2072 


TTCCCCTGCACCTGGCCCGACTGCCTGAAGAAGTTCAG 
CCGCAGCGACGAGCTGACCCGCCACTACCGCACCCAC 


6 


043296 


2073 


•TACGAGTGCGTGGAGTGCGGCAAGGCCTTCACCCGCAT 
GAGCGGCCTGACCCGCCACAAGCGCATCCAC 


7 


Z174 


2074 


TACAAGTGCGACGACTGCGGCAAGAGCTTCACCTGGAA 
CAGCGAGCTGAAGCGCCACAAGCGCGTGCAC 


8 


Z202 


2075 


TACCGCTGCGACGACTGCGGCAAGCACTTCCGCTGGAC 
CAGCGACCTGGTGCGCCACCAGCGCACCCAC 


9 


Zif268 F2 


2076 


TTCCAGTGCCGCATCTGCATGCGCAACTTCAGCCGCAG 
CGACCACCTGAGCACCCACATCCGCACCCAC 


10 


MAZ F2 


2077 


TACAACTGCAGCCACTGCGGCAAGAGCTTCAGCCGCCC 
CGACCACCTGAACAGCCACGTGCGCCAGGTGCAC 


11 


WT1 F3 


2078 


TTCCAGTGCAAGACCTGCCAGCGCAAGTTCAGCCGCAG 
C GAC C AC C TGAAGAC C C AC AC C C GC AC C C AC 


12 


Sp4 


2079 


CACAAGTGCCCCTACAGCGGCTGCGGCAAGGTGTACGG 
CAAGAGCAGCCACCTGAAGGCCCACTACCGCGTGCAC 


13 


BTE1 


2080 


CACAAGTGCCCCTACAGCGGCTGCGGCAAGGTGTACGG 
CAAGAGCAGCCACCTGAAGGCCCACTACCGCGTGCAC 
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14 


Z136 


2081 


TTCGAGTGCAAGCGCTGCGGCAAGGCCTTCCGCAGCAG 
CAGCAGCTTCCGCCTGCACGAGCGCACCCAC 


15 


Q15776 


2082 


TACGAGTGCGACGAGTGCGGCAAGACCTTCCGCCGCAG 
CAGC C AC CTGAT CGGC CAC C AGCGC AGCC AC 


16 


ZN84 


2083 


TACGAGTGCGGCGAGTGCGGCAAGGCCTTCAGCCGCAA 
GAGCCACCTGATCAGCCACTGGCGCACCCAC 



1 RNA Binding. 

Table 9. Nucleotide sequences of zinc finger modules and nucleotide sequences encoding 
other peptide sequences used in the construction of peptides to bind the sequence 5'- 



5 GCG-TGG-3'. 

The primers used to amplify the N-terminal finger of the pair (the equivalent of cassette 
B, above) add TGEKP (SEQ ID NO:3) linker sequences, and the restriction site Xmal (5'- 

10 CCOGGG-3') at the 5' end and znAgel site (5'-ACC-GGT-3') at the 3 5 end. Agel and 
Xmal create compatible ends, but have unique restriction sites. These primers are called 
CasBxFwd and CasBxRev, respectively, where x refers to the number of the zinc finger 
module in Table 9. The primers used to amplify the C-terminal finger of the pair (the 
equivalent of cassette C, above) add TGEKP (SEQ ID NO: 3) linker sequences, and the 

15 restriction site Xmal at the 5 ? end and a sequence encoding LRQKDGGGS (SEQ ID 
NO:2125), containing a restriction site for BamHl at the 3 5 end. These primers are 
referred to as CasCxFwd and CasCxRev, respectively. The 16 individual zinc finger 
cassettes are then purified using the QIAquick PCR purification kit (Qiagen). 



Name 


SEQ ID NO 


Sequence 


CasB9Fwd 


2084 


GATCCCCGGGGAGAAGCCCTTCCAGTGCCGCATCTGCAT 


CasBlOFwd 


2085 


GATCCCCGGGGAGAAGCCCTACAACTGCAGCCACTGCGG 


CasBllFwd 


2086 


GATCCCCGGGGAGAAGCCCTTCCAGTGCAAGACCTGCCA 


CasB12Fwd 


2087 


GATCCCCGGGGAGAAGCCCCACAAGTGCCCCTACAGCG 


CasBlSFwd 


2088 


GATCCCCGGGGAGAAGCCCCACAAGTGCCCCTACAGCG 


CasB14Fwd 


2089 


GATCCCCGGGGAGAAGCCCTTCGAGTGCAAGCGCTGCG 


CasBlSFwd 


2090 


GATCCCCGGGGAGAAGCCCTACGAGTGCGACGAGTGCG 


CasB16Fwd 


2091 


GATCCCCGGGGAGAAGCCCTACGAGTGCGGCGAGTGCG 


CasClFwd 


2092 


GATCCCCGGGGAGAAGCCCTACGCCTGCCCCGTGGAG 
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CasC2Fwd 


2093 


GATCCCGGGGGAGAAGCCCTTCGCCTGCGACATCTGCG 


CasC3Fwd 


2094 


GATCCCCGGGGAGAAGCCCTTCGCCTGCAGCTGGCAGG 


CasC4Fwd 


2095 


GATCCCCGGGGAGAAGCCCTTCAGCTGCCGCTGGCCC 


CasC5Fwd 


2096 


GATCCCCGGGGAGAAGCCCTTCCCCTGCACCTGGCCC 


CasC6Fwd 


2097 


GATCCCCGGGGAGAAGCCCTACGAGTGCGTGGAGTGCG 


CasC7Fwd 


2098 


GATCCCCGGGGAGAAGCCCTACAAGTGCGACGACTGCGG 


CasC8Fwd 


2099 


GATCCCCGGGGAGAAGCCCTACCGCTGCGACGACTGCG 


CasB9Rev 


2100 


CTTCTCACCGGTGTGGGTGCGGATGTGGGTG 


CasBlORev 


2101 


CTTCTCACCGGTGTGCACCTGGCGCACGTG 


CasBllRev 


2102 


CTTCTCACCGGTGTGGGTGCGGGTGTGGGT 


CasB12Rev 


2103 


CTTCTCACCGGTGTGCACGCGGTAGTGGGC 


CasB13Rev 


2104 


CTTGTCACCGGTGTGCACGCGGTAGTGGGC 


CasBHRev 


2105 


CTTCTCACCGGTGTGGGTGCGCTCGTGCAG 


CasB15Rev 


2106 


CTTCTCACCGGTGTGGCTGCGCTGGTGGCC 


CasB16Rev 


2107 


CTTCTCACCGGTGTGGGTGCGCCAGTGGCT 


CasClRev 


2108 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGATGC 
GGATGTGGCGG 


CasC2Rev 


2109 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGATCT 
TGGTGTGGCGC 


CasC3Rev 


2110 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGGTGC 
GGTAGTGGCG 


CasC4Rev 


2 1 1 1 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGCATGT 
TGTGGTGGCGC 


CasC5Rev 


2112 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGGTGC 
GGTAGTGGCG 


CasC6Rev 


2113 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGATGC 
GCTTGTGGCGG 


CasC7Rev 


2114 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGCACGC 
GCTTGTGGCG 


CasC8Rev 


2115 


GATCGGATCCGCCGCCGTCCTTCTGGCGCAGGTGGGTGC 
GCTGGTGGCG 
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ScaRev 


2116 


GTCATGCCATCCGTAAGATGC 


GSFwd 


2117 


GGCGGATCCTATCCGTATGATGTG 


ZiflFwd 


2118 


AGAGAGAGAGAGATCTATGGCGGAAGAACGTCCGTATGC 
GTGCCCGGTGGAAAG 


ZiflRev 


2119 


AGCCGGATCCCAAACACCGGTATGAATACGAATATGACG 
GG 


pETRevl 


2120 


AGTGTAGCGGTCACGCTGC 



Table 10. Oligonucleotides used for PGR construction of rapid zinc finger library. 
Annealing sequences are shown in bold, restriction sites are underlined. 

5 3-Finger Library Peptides 

The 2 natural zinc finger modules for each construct are appended to the C-terminus of 
Zif268 finger 1 (as in Example 4, library 2). Hence, a plasmid construct containing 
Zif268 finger 1 and appropriate restriction sites for cloning of the two natural finger 
10 modules is also prepared. The construction and cloning procedure for the 3 -finger 
libraries follows (see also Figure 6). 

(a) The plasmid pET23a/TZF-HA was assembled by PGR amplification of 
plasmid pTFZ-KOX (described in co-owned WO 01/53480) with primers AS1 and AS2. 

15 The sequences of these primers are as follows: 

AS1 : CGATGGATCCATGGGAGAGAAGGCGCTGC (SEQ ID NO:2126) 
AS2 : GCGTAAAGCTT ACGC ATAATCCGGC AC ATCATACGGATAAGAG 

CCGCCGCCGTCCTTCTGTCTTAAATGGATTT (SEQ ID NO:2127) 
The PCR product was gel purified and digested with BamHI and Hindlll, then 
20 repurified and cloned into BamH I/Hind Ill-digested pET23a vector (Novagen), yielding 
pET23a/TFZ-HA. A number of clones were picked and sequenced to verify the 
correctness of the inserts. 

(b) A fragment of approximately 1.2 kb is amplified from the vector 

25 pET23 a/TFZ-HA, using the primers ScaRev and GSFwd (Table 10). This fragment 
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contains the HA-epitope tag sequence (YPYDVPDYA* (SEQ ID NO: 2122)) and part of 
the GGGS (SEQ ID NO:1988) linker sequence at the 5' end. Additionally, the GSFwd 
primer adds a BamHI site at the extreme 5' end. The ScaRev primer does not contain a 
restriction site, but a Seal site from the vector is present approximately 40 bp downstream 
5 of the primer binding site. This fragment is cut with BamHI and Seal and inserted into 
similarly cut pET23a. 

(c) Zif268 finger 1 is then amplified using the PGR primers ZiflFwd and ZiflRev 
(Table 10), which add a Bglll site at the 5 9 end and both Agel and BamHI sites at the 3 ' 
10 end. This construct is then cut with Bglll and BamHI and inserted into the vector 

construct made in step (b), which has been linearised with BamHI. At this stage the new 
construct, termed pET23aZiflHA is sequenced to find correctly oriented zinc finger 
inserts. 

15 (d) Oligonucleotides encoding zinc finger modules for the C-terminus of the 3- 

finger constructs (cassette C) are amplified using the primers CasCxFor and CasCxRev 
(where x is 1 to 8, see Table 10). These cassettes are then digested with the restriction 
enzyme BamHI, and inserted into BamHI cut, dephosphorylated pET23aZiflHA. At this 
stage the new vector construct is not recircularised. 

20 

(e) Oligonucleotides encoding zinc finger modules for cassette B are amplified 
using primers CasBxFor and CasBxRev (where x is 9 to 16, see Table 10). These 
fragments are cut with the enzymes Xmal and Agel, at 37 °C for 1-2 hours. The linear 
vector produced in stage (d) above, is also cut with Agel and Xmal (as described), and 

25 dephosphorylated. Digested cassette B fragments are ligated into Agel, Xmal cut vector, 
in the presence of the restriction enzymes Agel and Xmal at room temperature for 16 
hours. During this incubation incorrectly ligated fragments are re-digested and re-ligated 
repeatedly, until the majority (or all) of the inserts are in the desired orientation. Correct 
3 -finger constructs have the assembly depicted in Figure 6. 

30 

(f) Finally, 3 -finger constructs are amplified from the ligated vector (produced in 
step (e)) using the primers pETFwdl (Table 5) and pETRevl (Table 10). 1 jlxI of each 
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ligation mixture is amplified in a 10 jllI (total volume) PCR reaction for 30 cycles. 
Alternatively, the ligated vector can be transformed into bacteria to produce samples 
containing single zinc finger clones. 

5 The above procedure results in the majority of PCR products being the correct 3-finger 
constructs, so that any incorrect fragments will not significantly affect the selection 
protocol, and the PCR products can be used for screening without further processing. 
Alternatively, 3-finger PCR products may be purified from an agarose gel before use. 

10 b. Screening of the Library Against 5'-GCG-TGG-GCG-3' 

Members of the zinc finger library can be screened against the desired target site from a 
mixed population of clones, or from individual clones as described in Example 4, 
Protocol A or Protocol B (above), respectively. The target site for the screen is produced 

15 by annealing the oligonucleotides Zifb site (AS 154) and Zif site RC (AS 155), as before. 
Template for protein expression is in each case made by PCR using primers pETFwdl 
(Table 5) and pETRevl (Table 10). 1 |ul of each PCR reaction is used to express protein 
and screen for binding to the Zif site in the manner described in Example 4. The DNA 
corresponding to the samples giving the highest fluorescence signals is collected, purified 

20 from a 1% TAE- agarose gel, and sequenced to determine the sequence of the optimal 
binding 3-finger peptide. 

Example 6: Reduced Human Zinc Finger Module Library for Universal DNA 

Recognition. 

25 

A library system similar to that described in Example 5 can be constructed using zinc 
finger modules from databases such as those in Examples 1, 2 and 3 to select 2-finger 
units which bind any 2-finger (6 bp) recognition sequence. There are only 4096 (=4 6 ) 
unique 6 bp sequences, therefore, a 2-finger library of natural zinc fingers (from specific 
30 animals, plants or fungi) can easily be constructed with enough variability to provide a 
specific 2-finger combination for optimal binding to any 6 bp target site. Again, to 
reduce the number of natural zinc finger modules that have to be constructed, a small 
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selection of natural zinc finger modules (e.g., 3) are chosen for each 3 bp binding 
sequence (according to their predicted or detemiined recognition sequence). There are 64 
(=4 3 ) possible 3 bp binding sequences so in the first instance less than 200 (i.e. 192) 
natural zinc finger modules are constructed. These 200 zinc finger modules can be in 
5 either of 2 possible positions in the 2-finger construct, which gives approximately 40,000 
(=200 2 ) combinations of fingers to bind the 4096 possible 6 bp target sites. As in 
Example 5, these 2-finger units are attached to Zif268 finger 1 which acts as an anchor 
for DNA recognition. 

10 a. Library Construction 

The selected zinc finger modules are reverse translated from their amino acid sequences 
and synthesised as oligonucleotides. Double stranded zinc finger cassettes for both N- 
terminal and C-terminal fingers are created by PGR using primers specific for the relevant 

15 zinc finger module. Each zinc finger module is amplified in 2 separate reactions, as 

described in Example 5. The first PCR reaction uses primers which add TGEKP (SEQ 
ID NO: 3) linker peptides and Agel andXmal restriction sites, to the 3' and 5' ends, 
respectively, to generate cassette B fragments. The second PCR reaction generates 
cassette C fragments by adding a TGEKP (SEQ ID NO: 3) linker and anXmal site at the 

20 5' end (this primer is the same as that used in cassette B production), and a sequence 

encoding the sequence LRQKDGGGS (SEQ ID NO:2125) and a BamHI restriction site at 
the 3' end. The final constructs are similar to that represented in Figure 6. 

b. Library Selection 

25 

The collection of 3 -finger zinc finger peptides produced above can be used to obtain 
specific domains for binding desired target sequences. Two exemplary approaches are 
described below. 



30 



Non-Cloning Selections* 
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A library constructed as described herein can be used to select optimal zinc finger 
domains for binding to any specified binding site. For instance, to select a peptide which 
binds the sequence 5'-GGA-TAA-3\ the binding site formed by annealing the 
oligonucleotides #l#5#6.b and #1#5#6 RC (Table 6, above), can be used as a target site 
5 (5 ' - GG A-T AA- GC G- 3 '). Selection of a zinc finger domain to bind such a target can be 
conducted, for example, in the manner described in Example 4. Briefly, the zinc finger 
library is diluted into 100 or more sub-libraries, which are screened as described above. 
The most active sub-libraries collected are further diluted to create much smaller sub- 
libraries, which are screened again, and so on. Following such a protocol, a library of 
10 40,000 members can be fully screened and a high-affinity binder selected in just 3 rounds. 

This selection procedure provides an extremely rapid method to select zinc finger 
peptides to bind any desired target site. The procedure also has the advantages of 
eliminating the need for cloning (as is required for methods such as phage display, see 
15 below), and is not limited by library size. 

ii)m Phage Library Selections 

Zinc finger polypeptide phage display libraries are made and used to select clones 
20 encoding peptides that bind the desired nucleotide sequence, as described in co-owned 
WO 98/53057. An exemplary phage display library contains peptides which bind target 
sites with the sequence 5 '-XXX-XXX-GCG-3 \ where X can be any nucleotide. Hence, 
libraries of phage can be selected using the same target sites as described above. The 
selection protocol for zinc fingers displayed on phage is briefly described below. 

25 

Protocol 

The selection protocol is adapted from that described in co-owned international patent 
application WO98/53057. 

30 
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The 3-finger constructs of the present Example are PCR amplified using universal 
forward and reverse primers which contain sites for Notl and Sfil respectively (called 
NatPhageF and NatPhageR, respectively). 

5 NatPhageF: GCAACTGC GGCCCAGCCGGCCA TGGCAGAGGAACGCCCGTATG (SEQ ID 
NO:2128) 

NatPhageR: GAGTCATTCTGCGGCCGCGTCCTTCTGGCGCAGGTG (SEQ ID NO:2129) 

Backward PCR primers in addition introduce Met-Ala-Glu as the first three amino acid 
10 residues of the zinc finger polypeptides, and these are followed by the residues of the 
wild type or library zinc finger polypeptides as required. Cloning overhangs are 
produced by digestion with Sfil and Notl where necessary. Nucleic acid encoding zinc 
finger polypeptide fragments is ligated into similarly prepared Fd~Tet~SN vector. This is 
a derivative of fd-tet-DOGl (Hoogenboom et ah (1991) Nucl Acids Res. 19:4133-4137), 
15 in which a section of the pelB leader and a restriction site for the enzyme Sfil (underlined) 
have been added by site-directed mutagenesis using the oligonucleotide: 

5 " CTCCTGCAGTTGGACCTGTGCCAT GGCCGGCTGGGCC GCATA 
GAATGGAACAACTAAAGC 3 1 (SEQ ID NO:2130) 

20 that anneals in the region of the polylinker. Electrocompetent DH5a cells are 

transformed with recombinant vector in 200 ng aliquots, grown for 1 hour in 2xTY 
medium with 1% glucose, and plated on TYE containing 15 |ug/ml tetracycline and 1% 
glucose. 

25 To generate phage for selections, tetracycline resistant colonies are transferred from 

plates into 2xTY medium (16g/litre Bacto tryptone, lOg/litre Bacto yeast extract, 5g/litre 
NaCl) containing 50jliM ZnCh and 15 jug/ml tetracycline, and cultured overnight at 30°C 
in a shaking incubator. Cleared culture supernatant containing phage particles is obtained 
by centrifuging at 300 xg for 5 minutes. 

30 
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Double stranded binding sites for use in selections are generated by annealing 
complementary oligonucleotides, one of which is biotinylated. 

Biotinylated DNA target sites (1 pmol) are bound to streptavidin-coated wells (Roche). 
5 Phage supernatant solutions are diluted 1 : 10 in PBS selection buffer (PBS containing 50 
juM Z11CI2, 2% Marvel, 1% Tween, 20 jag/ml sonicated salmon sperm DNA, and 10-fold 
excess of competitor DNA), and 200 jal is applied to each well for 1 hour at 20°C. After 
this time, the wells are emptied and washed 18 times with PBS containing 50[xM ZnC^ 
and 1% Tween and 2 times in PBS containing 50|jM ZnC^. Retained phage are eluted in 

10 100 |uil 0.1M triethylamine and neutralised with an equal volume of 1M Tris (pH 7.4). 
Logarithmic-phase E. coli JM109 (100 [il) are infected with eluted phage (100 |ul), and 
used to prepare phage supernatants for subsequent rounds of selection. After 4 rounds of 
selection, a 'pool' or 'mini-population 5 of phage is obtained, which bind the specified 
target sequence. These pools of phage can be stored at -70°C for later use. Additionally, 

15 E. coli infected with these phage pools can be plated to obtain individual clones, which 
can be tested by ELISA for binding affinity and specificity to obtain the 'best' clone (see 
Example 9, Quality Control). 

20 Example 7: Complete Human Zinc Finger Module Library for Universal DNA 

Recognition. 

An complete, or nearly complete, library containing all zinc finger sequences which bind 
a particular target site can be constructed using zinc finger modules to select 2-finger (or 
25 3-finger) units which bind any 6 bp (or 9 bp) recognition sequence. Two exemplary 
methods for construction of such a library are described. 

a. Oligonucleotide-Based Library Construction. 

30 All zinc finger modules may be synthesised as a single stranded oligonucleotide, as 
described in Example 4. Zinc finger modules are made double stranded and TGEKP 
(SEQ ID NO: 3) linkers added by PCR with 5 5 and 3' primers specific for each individual 
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zinc finger module, to make cassettes. These cassettes can then be recombined, as 
described in Example 5, to make random or deliberate combinations of zinc finger 
modules comprising 2, 3, or more linked fingers. 

5 b. PCR-Based Library Construction. 

Zinc fingers proteins (especially of the Cys2His2 family) form the second most abundant 
family of proteins in the human genome. Furthermore, in nature, zinc finger modules are 
often linked by the canonical linker peptide TGEKP (SEQ ID NO: 3), which begins 

10 immediately after the second zinc-coordinating histidine residue. Therefore, the peptide 
sequence HTGEKP (SEQ ID NO:2131) is commonly found between natural zinc finger 
modules. Because of this consensus sequence, it has been possible to clone natural zinc 
finger modules from the human genome (Becker, K.G., Nagel, J.W., Canning, R.D., 
Biddison, W.E., Ozato, K. & Drew, P.D. (1995) Hum. Mol. Genet. 4: 685-691; Bray, P., 

15 Lichter, P., Thiesen, H.-J., Ward, D.C. & Dawid, I.B. (1991) Proc. Natl. Acad. Sci. USA 

j 

88: 9563-9567), and the Arabidopsis genome (Meissner, R. & Michael, A. J. (1997) Plant 
Mol Biol 33: 615-624), using redundant primers for PGR. See also Pellegrino et al. 
(1991) Proc. Natl Acad. Sci. USA 88:671-675. It is preferable to use genomic DNA or a 
genomic DNA (gDNA) library, rather than a cDNA library, because transcription factors, 
20 such as zinc finger proteins, are strongly regulated during the cell cycle, development and 
in response to extracellular signals. Hence, a cDNA library will probably not contain the 
majority of zinc finger proteins, and will be biased towards highly expressed proteins. 

A suitable protocol for the PCR-extraction of zinc finger modules from human genomic 
25 DNA follows: 

Genomic DNA is purified directly from human cells, or provided by a gDNA library. 
gDNA libraries are preferable as they are commercially available (for example from 
Clontech, ATCC, Stratagene etc) and can be easily manipulated. PCR to extract zinc 
30 finger modules can be conducted directly on purified gDNA, or the gDNA library can be 
screened for zinc fingers containing the HTGEKP (SEQ ID NO:2131) motif before 
carrying out PCR. To screen the gDNA library, any method known to one of skill in the 
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art, e.g. colony hybridisation, can be used. Phage containing gDNA inserts are plated 
onto Escherichia coli XL-1 Blue bacterial lawns. At least 10 6 phage plaques are 
transferred to replica filters and screened with, for example, a 27-mer P-radiolabelled 
degenerate oligonucleotide, which anneals to the conserved linker region of zinc finger 
5 proteins and adjacent sequences. The sequence of a suitable degenerate probe (SEQ ID 
NO:2132), and the amino acid sequence (SEQ ID NO:2133) to which it corresponds is 
shown below. 

c G / T c / G a t / c c /g ca c / t ac c / g gg c / g ga g / a aa g / a cc c / t t a / t c / t 

10 R/L I/T/M H T G E K P Y/F 

Hybridisation is performed, e.g., for 16 hours at 42-50 °C, following which filters are 
washed 3-5 times, to remove non-specifically bound probe, in 0.2x standard saline citrate 
(SSC)/0.1% SDS. Filters are then subjected to autoradiography or phosphorimaging to 
1 5 determine positive plaques. 

Positive plaques are picked into log-phase E. coli XL-1 Blue bacterial cultures and the 
phage are harvested for PCR. 1 pi phage supernatant is added to 49 jllI PGR pre-mix, 
containing the oligonucleotide primers TGEKPfor (SEQ ID NO:2134) and TGEKPrev 

20 (SEQ ID NO:2135) (shown below, annealing sequence in bold), and zinc finger modules 
are amplified by 30 cycles of PCR. TGEKPfor (SEQ ID NO:2134) and TGEKPrev (SEQ 
ID NO:2135) also contain Xbal and EcoRI restriction sites (underlined), respectively. 
PCR products are separated on 1.5% TAE-agarose gels and fragments of approximately 
120 bp (corresponding to 1 zinc finger module plus flanking sequences) are purified, as 

25 described in Example 4. Additionally, fragments of approximately 220 bp, corresponding 
to natural 2-finger units, can also be collected and used. Such products can be digested 
with Xbal and EcoRI and cloned into a vector that has been digested so as to generate 
compatible ends, such as, for example, pcDNA3.1(-) (Invitrogen) digested with EcoRI 
mid Xbal.. Such a vector pool can then be used as a source for natural 1- or 2-zinc finger 

30 modules, from which to construct 2- or 3 -zinc finger peptides for selections as described 
above. Zinc finger modules for cassette B can be amplified from such vectors using the 
universal primers TGEKPXma (SEQ ID NO:2136) and TGEKPAge (SEQ ID NO:2137), 
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which anneal to the conserved TGEKP (SEQ ID NO:3) linker regions and add restriction 
sites for the enzymes Xmal at the 5' terminus and Agel at the 3 ' terminus, respectively 
(restriction sites underlined). Cassette C units can be amplified using the primer 
TGEKPXma (SEQ ID NO:2136) and TGEKPend (SEQ ID NO:2138), which adds a 3' 
5 TRQKDGGGS (SEQ ID NO:2139) sequence incorporating a BamHI site (underlined, see 
below). Two- and 3-finger constructs can then be constructed and screened as described 
in the Examples above. 

TGEKPfor: 
10 NO.-2134) 
TGEKPrev: 
NO:2135) 
TGEKPXma: 

TGEKP Age: 

15 TGEKPend: 



Example 8. 

20 

Microarray analysis can also be used to determine the binding site specificity of 2- and 3- 
finger peptides. For example, a 3-zinc finger library, with finger 1 fixed as Zif268 finger 
one recognises the sequence 5 '-XXX-XXX-GCG-3 ', where X is any specified nucleotide. 
Hence, there are 4096 (=4 6 ) unique binding sites for such a library. All 4096 of these 

25 sites can be arrayed onto a single glass slide, allowing a specified 2 -finger peptide to be 
screened against every possible binding site at once. A suitable protocol for such an 
experiment is described in Martha L. Bulyk, Xiaohua Huang, Yen Choo, & George 
M. Church (Proc. Natl. Acad. Sci. USA: Vol. 98, No. 13, 7158-7163, June 19, 2001) 
which is incorporated, by reference, in its entirety. See also co-owned WO 01/25417, the 

30 disclosure of which is hereby incorporated by reference in its entirety. 



TTAGTCTAGA C / G CA C / T AC C / G GG C / G GA G / A AA G / A CC (SEQ ID 

TACTGAATTC G / A GG C / T TT C / T TC G / C CC G / C GT G / A TG (SEQ ID 

TCTAGA C / G CA C / T C£CGGGGA G / A AA G / A CC (SEQ ID NO:2136) 

GAATTC G / A GG C / T TT C / T TCACCGGT G / A TG (SEQ ID NO:2137) 

AGTGTGGTGGAATTC G / A GGGGATCCGCCGCCGTC C / T TT 
C / T TG G / C CG G / C GT G / A TG (SEQ ID NO:2138) 

Microarray Analysis. 
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The amount of binding to each target sequence can be visualised and quantified using 
simple fluorescence measurements. For example, the zinc finger peptide can be 
expressed in vitro ^ or on the surface of phage. Isolated zinc finger peptides may contain 
an epitope tag for labelling purposes, whereas bound phage can be detected using a 
5 primary antibody against a phage coat protein, such as gVIII. A secondary antibody, such 
as one conjugated to R-phycoerythrin may be used to provide a visible signal when a 
suitable substrate is applied. 



10 Example 9. Quality Control. 



Particular 2- or 3 -finger peptides can be screened to determine their specificity or affinity, 
as desired. 
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a. Phage ELISA Assay 

Phage supernatants from Round 4 of selection (Example 6, supra) are used to infect 
E. coli JM109 bacteria, and grown to prepare fresh supernatants for zinc finger phage 
5 ELISA, using standard procedures as described previously (Choo, Y. & Klug, A. (1994) 
Proc. Natl Acad. Set USA 91, 11163-11167; Choo 5 Y. & Klug, A. (1994) Proc. Natl 
Acad. Sci. USA 91, 11168-11172). Briefly, 5-biotinylated, positionally randomised 
oligonucleotide libraries, containing Zif268 binding site variants, are synthesised by 
annealing complimentary oligonucleotides as described supra. DNA libraries are added 

10 to streptavidin-coated ELISA wells (Boehringer-Maiinheim) in PBS containing 50\jlM 
ZnCl2 (PBS/Zn). Phage solution (overnight bacterial culture supernatant diluted 1:10 in 
PBS/Zn containing 2% Marvel, 1% Tween and 20|Lxg/ml sonicated salmon sperm DNA) is 
applied to each well (50|nl/well). Binding is allowed to proceed for one hour at 20°C 
Unbound phage are removed by washing 7 times with PBS/Zn containing 1% Tween, 

1 5 then 3 times with PBS/Zn. Bound phage are detected by ELISA using horseradish 

peroxidase-conjugated anti-M13 IgG (Pharmacia Biotech) and the colourimetric signal is 
quantitated using SOFTMAX 2.32 (Molecular Devices). 

For rapid validation, the entire population of phage from Round 4 selection can be 
20 assayed in two ELISA wells: one containing the target DNA binding site, and one 

containing a control DNA binding site with between 1 and 5 base changes from the target 
sequence. A selection is deemed to be successful if the ELISA signal (representing DNA 
binding) is higher in the target well than in the control well. 

25 The higher the signal measured above, the greater the population of specific binding 

clones. However, individual low values for such a procedure do not necessarily indicate 
a failure of the selection, as there may be individual high affinity / specificity clones 
within the round 4 phage population that may be masked by other non-specific clones. 
Nevertheless, this assay provides a quick profile of the overall quality of selection. 



30 
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For a more detailed validation, individual phage clones are recovered from Round 4 by 
plating out infected bacterial colonies on agar. Fresh phage supernatants are prepared 
from these colonies and assayed by ELISA, as described above. 

5 Finally, the coding sequence of individual zinc finger clones can be amplified by PCR 
using external primers complementary to phage sequence, and the PCR products are then 
sequenced to determine the amino acid sequence of the selected zinc fingers. 

i 

As an alternative, individual 3-finger peptides can be analysed by gel-shift assays or by 
10 microarray screening, as described infra. See also WO 00/41566, WO 00/42219 and 
WO 01/25417. 

b. Gel-Shift Assay 

32 

Peptides are assayed using P end-labelled synthetic oligonucleotide duplexes containing 
15 the appropriate binding site sequences. 

DNA binding reactions contain the appropriate zinc-finger peptide, binding site and 1 jag 
competitor DNA (e.g., poly dl-dC or salmon sperm DNA) in a total volume of 10 pi, 
which contains: 20 mM Bis-tris propane (pH 7.0), 100 rnM NaCl, 5 niM MgCl 2 , 50 pM 
ZnCl 2 , 5 mM DTT, 0.1 mg/ml BSA, 0.1% Nonidet P40. Incubations are performed at 
20 room temperature for 1 hour. 

To determine the concentration of zinc finger peptide produced in the in vitro expression 
system, crude protein samples are used in gel-shift assays against a dilution series of the 
appropriate binding site. Binding site concentration is always well above the Kd of the 
peptide, but ranged from a higher concentration than the peptide (80 mM), at which all 
25 available peptide binds DNA, to a lower concentration (3-5 mM), at which all DNA is 

bound. Controls are carried out to ensure that binding sites are not shifted (i.e., bound) in 
the absence of zinc finger peptide. The reaction mixtures are then separated on a 7% 
native polyacrylamide gel. Radioactive signals are quantitated by Phosphorlmager 



WO 02/099084 



PCT/US02/22272 



143 

analysis to determine the amount of shifted binding site, and hence, the concentration of 
active zinc finger peptide. 

Dissociation constants (Kd) are determined in parallel to the calculation of active peptide 
concentration. For determination of Kd, serial 3, 4 or 5-fold dilutions of crude peptide are 
5 made and incubated with radiolabeled binding site (10 pM - 10 nM depending on the 
peptide), as above. Samples are run on 7% native polyacrylamide gels and the 
radioactive signals quantitated by PhosphorJrnager analysis. The data is then analysed 
according to linear transformation of the binding equation and plotted in CA-Cricket 
Graph III (Computer Associates Inc. NY) to generate the apparent dissociation constants. 
10 The Ka values reported are the average of at least two separate determinations. 

r 

c. Micro array Assay 

Selected zinc finger domains can also be assayed for binding site specificity using the 
15 micro array analysis outlined in Example 8. 

All publications mentioned in the above specification are herein incorporated by 
reference. Various modifications and variations of the described methods and system of 
the invention will be apparent to those skilled in the art without departing from the scope 

20 and spirit of the invention. Although the invention has been described in connection with 
specific preferred embodiments, it should be understood that the invention as claimed 
should not be unduly limited to such specific embodiments. Indeed, various 
modifications of the described modes for carrying out the invention which are apparent to 
those skilled in molecular biology or related fields are intended to be within the scope of 

25 the following claims. 
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CLAIMS 



1 . A composite binding polypeptide comprising a first natural binding domain 
derived from a first natural binding polypeptide, and a second natural binding domain 
derived from a second natural binding polypeptide, wherein said first and second natural 
binding polypeptides may be the same or different; which polypeptide binds to a target, 
said target differing from the natural target of the both the first and the second binding 
polypeptides. 

2. A composite polypeptide according to claim 1, wherein said first and second 
natural binding polypeptides are different polypeptides. 

3. A composite polypeptide according to claim 1 or claim 2, comprising three or 
more natural binding domains. 

4. A composite polypeptide according to any preceding claim, wherein the binding 
domains are nucleic acid binding domains. 

i 

5. A composite polypeptide according to claim 4, which is a nucleic acid binding 
polypeptide. 

6. A composite polypeptide according to claim 4 or claim 5 which is a zinc finger 
polypeptide, and the natural binding domains are zinc finger domains. 

7. A composite polypeptide according to claim 6, which comprises a Cys2-His2 zinc 
finger binding domain. 

8. A composite polypeptide according to claim 6 or claim 7, which comprises a 
Cys3-His zinc finger binding domain. 

9. A composite polypeptide according to any preceding claim, which comprises 6 or 
more natural binding domains. 
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10. A composite polypeptide according to claim 9, wherein 6 natural binding domains 
are arranged in a 3x2 conformation, separated by linker sequences. 

11. A chimeric polypeptide comprising: 

(a) a binding polypeptide according to any preceding claim, and 

(b) a biological effector domain. 

11. A library of natural binding domains. 

12. A library according to claim 1 1 3 comprising a plurality of natural binding domains 
from which a polypeptide according to any one of claims 1 to 10 can be assembled. 

13. A library of natural zinc finger nucleic acid binding domains, wherein said zinc 
finger domains comprise a linker attached thereto. 

14. A library according to claim 13, wherein the linker comprises the sequence 
TGEKP. 

15. A method for selecting a binding polypeptide capable of binding to a target site, 
comprising: 

(a) providing a library of natural binding domains; 

(b) assembling two or more of said domains to form a composite polypeptide; 

(c) screening said composite polypeptide against the target site in order to 
determine its ability to bind the target site. 

16. A method according to claim 15, wherein the natural binding domains are zinc 
finger binding domains. 

17. A method according to claim 15 or claim 16, wherein two or more composite 
polypeptides comprising two or more domains which are selected for binding to two or 
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more target sites are combined to provide a composite polypeptide which binds to an 
aggregate binding site comprising the two or more target binding sites. 

18. A method for designing a composite binding polypeptide, comprising: 

(a) providing information defining a target site; 

(b) selecting, from a database of natural binding domains, sequences of binding 
domains which are predicted to bind to the target site by the application of one or more 
rules which define target binding interactions for the binding domains; and 

(c) displaying the sequences of the binding domains, separated by linker 
sequences, and optionally assembling the binding polypeptide from a library of said 
domains. 

19. A method according to claim 18, wherein the binding domains are zinc finger 
domains. 

20. A method according to claim 19, wherein the zinc fingers are considered to bind 
to a nucleic acid triplet and domains are selected according to one or more of the 
following rules: 

(a) if the 5 5 base in the triplet is G, then position +6 in the a-helix is Arg; or 
position +6 is Ser or Thr and position ++2 is Asp; 

(b) if the 5' base in the triplet is A, then position +6 in the a-helix is Gin and ++2 
is not Asp; 

(c) if the 5' base in the triplet is T, then position +6 in the a-helix is Ser or Thr 
and position ++2 is Asp; 

(d) if the 5' base in the triplet is C, then position +6 in the a-helix may be any 
amino acid, provided that position ++2 in the a-helix is not Asp; 

(e) if the central base in the triplet is G, then position +3 in the a-helix is His; 

(f) if the central base in the triplet is A, then position +3 in the a-helix is Asn; 

(g) if the central base in the triplet is T, then position +3 in the a-helix is Ala, Ser 
or Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 
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(h) if the central base in the triplet is C, then position +3 in the cc-helix is Ser, Asp, 
Glu, Leu, Thr or Val; 

(i) if the 3' base in the triplet is G, then position -1 in the a-helix is Arg; 
(j) if the 3' base in the triplet is A, then position -1 in the a-helix is Gin; 

(k) if the 3 5 base in the triplet is T, then position -1 in the a-helix is Asn or Gin; 
(1) if the V base in the triplet is C, then position -1 in the a-helix is Asp. 

21. A method according to claim 19, wherein the zinc fingers are considered to bind 
to a nucleic acid quadruplet and domains are selected according to one or more of the 
following rules: 

(a) if base 4 in the quadruplet is G, then position +6 in the a-helix is Arg or Lys; 

(b) if base 4 in the quadruplet is A, then position +6 in the a-helix is Glu, Asn or 

Val; 

(c) if base 4 in the quadruplet is T, then position +6 in the a-helix is Ser, Thr, Val 

or Lys; 

(d) if base 4 in the quadruplet is C, then position +6 in the a-helix is Ser, Thr, Val, 
Ala, Glu or Asn; 

(e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 

(f) if base 3 in the quadruplet is A, then position +3 in the a-helix is Asn; 

(g) if base 3 in the quadruplet is T, then position +3 in the a-helix is Ala, Ser or 
Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 

(h) if base 3 in the quadruplet is C, then position -1-3 in the a-helix is Ser, Asp, 
Glu, Leu, Thr or Val; 

(i) if base 2 in the quadruplet is G, then position -1 in the a-helix is Arg; 
(j) if base 2 in the quadruplet is A, then position -1 in the a-helix is Gin; 

(k) if base 2 in the quadruplet is T, then position -1 in the a-helix is His or Thr; 

(1) if base 2 in the quadruplet is C, then position -1 in the a-helix is Asp or His; 

(m) if base 1 in the quadruplet is G, then position +2 is Glu; 

(n) if base 1 in the quadruplet is A, then position +2 Arg or Gin; 

(o) if base 1 in the quadruplet is C, then position +2 is Asn, Gin, Arg, His or Lys; 

(p) if base 1 in the quadruplet is T, then position +2 is Ser or Thr. 
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22. A method according to claim 19, wherein the zinc fingers are considered to bind 
to a nucleic acid quadruplet and domains are selected according to one or more of the 
following rules: 

(a) if base 4 in the quadruplet is G 5 then position +6 in the a-helix is Arg; or 
position +6 is Ser or Thr and position -H-2 is Asp; 

(b) if base 4 in the quadruplet is A, then position +6 in the a-helix is Gin and ++2 
is not Asp; 

(c) if base 4 in the quadruplet is T, then position +6 in the a-helix is Ser or Thr 
and position ++2 is Asp; 

(d) if base 4 in the quadruplet is C, then position +6 in the a-helix may be any 
amino acid, provided that position ++2 in the a-helix is not Asp; 

(e) if base 3 in the quadruplet is G, then position +3 in the a-helix is His; 

(f) if base 3 in the quadruplet is A ? then position +3 in the a-helix is Asn; 

(g) if base 3 in the quadruplet is T, then position +3 in the a-helix is Ala, Ser or 
Val; provided that if it is Ala, then one of the residues at -1 or +6 is a small residue; 

(h) if base 3 in the quadruplet is C, then position +3 in the a-helix is Ser, Asp, 
Glu, Leu, Thr or Val; 

(i) if base 2 in the quadruplet is G, then position -1 in the a-helix is Arg; 
(j) if base 2 in the quadruplet is A, then position -1 in the a-helix is Gin; 

(k) if base 2 in the quadruplet is T, then position -1 in the a-helix is Asn or Gin; 
(1) if base 2 in the quadruplet is C, then position -1 hi the a-helix is Asp; 
(m) if base 1 in the quadruplet is G, then position +2 is Asp; 
(n) if base 1 in the quadruplet is A, then position +2 is not Asp; 
(o) if base 1 in the quadruplet is C, then position +2 is not Asp; 
(p) if base 1 in the quadruplet is T, then position +2 is Ser or Thr. 

23. The method of any of claims 18-22, further comprising the step of synthesizing a 
polynucleotide encoding the binding polypeptide. 
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24. A computer-implemented method for designing a zinc finger polypeptide, 
comprising the steps of: 

(a) providing a system comprising at least storage means for storing data relating 
to a library of zinc fingers; storage means for storing a rule table; means for inputting 
target nucleic acid sequence data; processing means for generating a result; and means for 
outputting the result; 

(b) inputting sequence data for a target nucleic acid molecule; 

(c) defining a first target zinc finger binding site in said nucleic acid molecule; 

(d) interrogating the zinc finger library and rule table storage means, comparing 
zinc fingers to the target zinc finger binding site according to the rule table and selecting 
zinc finger data identifying a zinc finger capable of binding to said target site; 

(e) defining at least one further target zinc finger binding site and repeating step 
(d); and 

(f) outputting the selected zinc finger data. 

25. A method according to claim 24, further comprising sending instructions to an 
automated chemical synthesis system to assemble a zinc finger polypeptide as defined by 
the zinc finger data obtained in (f). 

26. A method according to claim 25, wherein the zinc finger polypeptide is tested for 
binding to the target site, and data from said testing is used to select, from a plurality of 
candidates, a zinc finger polypeptide capable of binding to the target site. 

27. A method according to any one of claims 24 to 26, wherein two or more zinc 
finger polypeptides are combined to form a zinc finger polypeptide capable of binding to 
an aggregate binding site comprising two or more target sites. 

27. A method according to claim 24, wherein the rule table comprises rules as set 
forth in any one of claims 21 to 23. 
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